Results 1 - 10
of
49
Algorithms for Parallel Memory I: Two-Level Memories
, 1992
"... We provide the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for the problems of sorting, FFT, matrix transposition, standard matrix multiplication, and related problems. Our two-level memory model is n ..."
Abstract
-
Cited by 226 (32 self)
- Add to MetaCart
We provide the first optimal algorithms in terms of the number of input/outputs (I/Os) required between internal memory and multiple secondary storage devices for the problems of sorting, FFT, matrix transposition, standard matrix multiplication, and related problems. Our two-level memory model is new and gives a realistic treatment of parallel block transfer, in which during a single I/O each of the P secondary storage devices can simultaneously transfer a contiguous block of B records. The model pertains to a large-scale uniprocessor system or parallel multiprocessor system with P disks. In addition, the sorting, FFT, permutation network, and standard matrix multiplication algorithms are typically optimal in terms of the amount of internal processing time. The difficulty in developing optimal algorithms is to cope with the partitioning of memory into P separate physical devices. Our algorithms' performance can be significantly better than those obtained by the well-known but nonopti...
External-Memory Computational Geometry
, 1993
"... In this paper, we give new techniques for designing efficient algorithms for computational geometry problems that are too large to be solved in internal memory, and we use these techniques to develop optimal and practical algorithms for a number of important largescale problems. We discuss our algor ..."
Abstract
-
Cited by 117 (20 self)
- Add to MetaCart
In this paper, we give new techniques for designing efficient algorithms for computational geometry problems that are too large to be solved in internal memory, and we use these techniques to develop optimal and practical algorithms for a number of important largescale problems. We discuss our algorithms primarily in the contex't of single processor/single disk machines, a domain in which they are not only the first known optimal results but also of tremendous practical value. Our methods also produce the first known optimal algorithms for a wide range of two-level and hierarchical muir{level memory models, including parallel models. The algorithms are optimal both in terms of I/0 cost and internal computation.
LH*RS -- a high-availability scalable distributed data structure
"... (SDDS). An LH*RS file is hash partitioned over the distributed RAM of a multicomputer, e.g., a network of PCs, and supports the unavailability of any of its k ≥ 1 server nodes. The value of k transparently grows with the file to offset the reliability decline. Only the number of the storage nodes p ..."
Abstract
-
Cited by 53 (9 self)
- Add to MetaCart
(SDDS). An LH*RS file is hash partitioned over the distributed RAM of a multicomputer, e.g., a network of PCs, and supports the unavailability of any of its k ≥ 1 server nodes. The value of k transparently grows with the file to offset the reliability decline. Only the number of the storage nodes potentially limits the file growth. The high-availability management uses a novel parity calculus that we have developed, based on the Reed-Salomon erasure correcting coding. The resulting parity storage overhead is about the minimal ever possible. The parity encoding and decoding are faster than for any other candidate coding we are aware of. We present our scheme and its performance analysis, including experiments with a prototype implementation on Wintel PCs. The capabilities of LH*RS offer new perspectives to data intensive applications, including the emerging ones of grids and of P2P computing.
Tolerating Multiple Failures in RAID Architectures with Optimal Storage and Uniform Declustering
- In Proceedings of the 24th International Symposium on Computer Architecture
, 1996
"... We present Datum, a novel method for tolerating multiple disk failures in disk arrays. Datum is the first known method that can mask any given number of failures, requires an optimal amount of redundant storage space, and spreads reconstruction accesses uniformly over disks in the presence of failur ..."
Abstract
-
Cited by 47 (5 self)
- Add to MetaCart
We present Datum, a novel method for tolerating multiple disk failures in disk arrays. Datum is the first known method that can mask any given number of failures, requires an optimal amount of redundant storage space, and spreads reconstruction accesses uniformly over disks in the presence of failures without needing large layout tables in controller memory. Our approach is based on information dispersal, a coding technique that admits an efficient hardware implementation. As the method does not restrict the configuration parameters of the disk array, many existing RAID organizations are particular cases of Datum. A detailed performance comparison with two other approaches shows that Datum's response times are similar to those of the best competitor when two or less disks fail, and that the performance degrades gracefully when more than two disks fail. 1 Introduction Disk arrays [15] offer significant advantages over conventional disks. Fragmentation of the total storage space into ...
Fast Concurrent Access to Parallel Disks
- In 11th ACM-SIAM Symposium on Discrete Algorithms
, 1999
"... High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is ..."
Abstract
-
Cited by 44 (11 self)
- Add to MetaCart
High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is perhaps the main inhibitor for adapting single-disk external memory algorithms to multiple disks. This paper shows that this problem can be solved efficiently using a combination of randomized placement, redundancy and an optimal scheduling algorithm. A buffer of O(D) blocks suffices to support efficient writing of arbitrary blocks if blocks are distributed uniformly at random to the disks (e.g., by hashing). If two randomly allocated copies of each block exist, N arbitrary blocks can be read within dN=De + 1 I/O steps with high probability. In addition, the redundancy can be reduced from 2 to 1 + 1=r for any integer r. These results can be used to emulate the simple and powerful "single-disk multi-head" model of external computing [1] on the physically more realistic independent disk model [33] with small constant overhead. This is faster than a lower bound for deterministic emulation [3].
Disk scrubbing in large archival storage systems
- In Proceedings of the 12th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS ’04
, 2004
"... Large archival storage systems experience long periods of idleness broken up by rare data accesses. In such systems, disks may remain powered off for long periods of time. These systems can lose data for a variety of reasons, including failures at both the device level and the block level. To deal w ..."
Abstract
-
Cited by 44 (11 self)
- Add to MetaCart
Large archival storage systems experience long periods of idleness broken up by rare data accesses. In such systems, disks may remain powered off for long periods of time. These systems can lose data for a variety of reasons, including failures at both the device level and the block level. To deal with these failures, we must detect them early enough to be able to use the redundancy built into the storage system. We propose a process called “disk scrubbing” in a system in which drives are periodically accessed to detect drive failure. By scrubbing all of the data stored on all of the disks, we can detect block failures and compensate for them by rebuilding the affected blocks. Our research shows how the scheduling of disk scrubbing affects overall system reliability, and that “opportunistic ” scrubbing, in which the system scrubs disks only when they are powered on for other reasons, performs very well without the need to power on disks solely to check them. 1.
AIDA-based Real-Time Fault-Tolerant Broadcast Disks
- In Proceedings of RTAS'96: The 1996 IEEE Real-Time Technology and Applications Symposium
, 1996
"... The proliferation of mobile computers and wireless networks requires the design of future distributed real-time applications to recognize and deal with the significant asymmetry between downstream and upstream communication capacities, and the significant disparitybetween server and client storag ..."
Abstract
-
Cited by 40 (12 self)
- Add to MetaCart
The proliferation of mobile computers and wireless networks requires the design of future distributed real-time applications to recognize and deal with the significant asymmetry between downstream and upstream communication capacities, and the significant disparitybetween server and client storage capacities.
MDS Array Codes with Independent Parity Symbols
- IEEE Trans. on Information Theory
, 1996
"... Abstruct- A new family of MDS array codes is presenled. The code arrays contain p information columns and T indelpendent parity columns, each column consisting of p- 1 bits, where p is a prime. We extend a previously known construction for 1 he case T = 2 to three and more parity columns. It is show ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
Abstruct- A new family of MDS array codes is presenled. The code arrays contain p information columns and T indelpendent parity columns, each column consisting of p- 1 bits, where p is a prime. We extend a previously known construction for 1 he case T = 2 to three and more parity columns. It is shown that when r = 3 such extension is possible for any prime p. For larger values of T, we give necessary and sufficient conditions for our codes to be MDS, and then prove that if p belongs to a certain class of primes these conditions are satisfied up to T 5 8. One of the advantages of the new codes is that encoding and decoding may be accomplished using simple cyclic shifts and XOR operations on the columns of the code array. We devellop efficient decoding procedures for the case of two- and three-column erroirs. This again extends the previously known results for the case of ii single-column error. Another primary advantage of our codes is related to the problem of efficient information updates. We present upper and lower bounds on the average number of parity bits which have to be updated in an MDS code over GF (2"), following an update in a single information bit. This average number is of importance in many storage applications wlhich require frequent updates of information. We show that the upper bound obtained from our codes is close to the lower bound and, most importantly, does not depend on the size of the code syimbols. Index Terms- MDS codes, array codes, efficient dwoding, efficient information updates. I.
Selecting RAID levels for disk arrays
, 2002
"... Disk arrays have a myriad of configuration parameters that interact in counter-intuitive ways, and those interactions can have significant impacts on cost, performance, and reliability. Even after values for these parameters have been chosen, there are exponentially-many ways to map data onto the di ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
Disk arrays have a myriad of configuration parameters that interact in counter-intuitive ways, and those interactions can have significant impacts on cost, performance, and reliability. Even after values for these parameters have been chosen, there are exponentially-many ways to map data onto the disk arrays' logical units. Meanwhile, the importance of correct choices is increasing: storage systems represent an growing fraction of total system cost, they need to respond more rapidly to changing needs, and there is less and less tolerance for mistakes. We believe that automatic design and configuration of storage systems is the only viable solution to these issues. To that end, we present a comparative study of a range of techniques for programmatically choosing the RAID levels to use in a disk array. Our simplest approaches are modeled on existing, manual rules of thumb: they "tag" data with a RAID level before determining the configuration of the array to which it is assigned. Our best approach simultaneously determines the RAID levels for the data, the array configuration, and the layout of data on that array. It operates as an optimization process with the twin goals of minimizing array cost while ensuring that storage workload performance requirements will be met. This approach produces robust solutions with an average cost/performance 14-- 17% better than the best results for the tagging schemes, and up to 150--200% better than their worst solutions. We believe that this is the first presentation and systematic analysis of a variety of novel, fully-automatic RAID- level selection techniques. 1
Evaluation of Distributed Recovery in Large-Scale Storage Systems
- IN PROCEEDINGS OF THE 13TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING (HPDC
, 2004
"... Storage clusters consisting of thousands of disk drives are now being used both for their large capacity and high throughput. However, their reliability is far worse than that of smaller storage systems due to the increased number of storage nodes. RAID technology is no longer sufficient to guarante ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
Storage clusters consisting of thousands of disk drives are now being used both for their large capacity and high throughput. However, their reliability is far worse than that of smaller storage systems due to the increased number of storage nodes. RAID technology is no longer sufficient to guarantee the necessary high data reliability for such systems, because disk rebuild time lengthens as disk capacity grows. In this paper, we present FAst Recovery Mechanism (FARM), a distributed recovery approach that exploits excess disk capacity and reduces data recovery time. FARM works in concert with replication and erasure-coding redundancy schemes to dramatically lower the probability of data loss in large-scale storage systems. We have examined essential factors that influence system reliability, performance, and costs, such as failure detections, disk bandwidth usage for recovery, disk space utilization, disk drive replacement, and system scales, by simulating system behavior under disk failures. Our results show the reliability improvement from FARM and demonstrate the impacts of various factors on system reliability. Using our techniques, system designers will be better able to build multi-petabyte storage systems with much higher reliability at lower cost than previously possible.

