Results 1 -
5 of
5
Designing Disk Arrays for High Data Reliability
"... Redundancy based on a parity encoding has been proposed for insuring that disk arrays provide highly reliable data. Parity-based redundancy will tolerate many independent and dependent disk failures (shared support hardware) without on-line spare disks and many more such failures with on-line spare ..."
Abstract
-
Cited by 51 (9 self)
- Add to MetaCart
Redundancy based on a parity encoding has been proposed for insuring that disk arrays provide highly reliable data. Parity-based redundancy will tolerate many independent and dependent disk failures (shared support hardware) without on-line spare disks and many more such failures with on-line spare disks. This paper explores the design of reliable, redundant disk arrays. In the context of a 70 disk strawman array, it presents and applies analytic and simulation models for the time until data is lost. It shows how to balance requirements for high data reliability against the overhead cost of redundant data, on-line spares, and on-site repair personnel in terms of an array’s architecture, its component reliabilities, and its repair policies.
Commercial Fault Tolerance: A Tale of Two Systems
- IEEE Transactions on Dependable and Secure Computing
, 2004
"... Abstract—This paper compares and contrasts the design philosophies and implementations of two computer system families: the IBM S/360 and its evolution to the current zSeries line, and the Tandem (now HP) NonStop1 Server. Both systems have a long history; the initial IBM S/360 machines were shipped ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
Abstract—This paper compares and contrasts the design philosophies and implementations of two computer system families: the IBM S/360 and its evolution to the current zSeries line, and the Tandem (now HP) NonStop1 Server. Both systems have a long history; the initial IBM S/360 machines were shipped in 1964, and the Tandem NonStop System was first shipped in 1976. They were aimed at similar markets, what would today be called enterprise-class applications. The requirement for the original S/360 line was for very high availability; the requirement for the NonStop platform was for single fault tolerance against unplanned outages. Since their initial shipments, availability expectations for both platforms have continued to rise and the system designers and developers have been challenged to keep up. There were and still are many similarities in the design philosophies of the two lines, including the use of redundant components and extensive error checking. The primary difference is that the S/360-zSeries focus has been on localized retry and restore to keep processors functioning as long as possible, while the NonStop developers have based systems on a loosely coupled multiprocessor design that supports a “fail-fast ” philosophy implemented through a combination of hardware and software, with workload being actively taken over by another resource when one fails. Index Terms—Computer systems implementation, fault tolerance, high availability. 1
On-Line Data Reconstruction In Redundant Disk Arrays
, 1994
"... There exists a wide variety of applications in which data availability must be continuous, that is, where the system is never taken off-line and any interruption in the accessibility of stored data causes significant disruption in the service provided by the application. Examples include on-line tra ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
There exists a wide variety of applications in which data availability must be continuous, that is, where the system is never taken off-line and any interruption in the accessibility of stored data causes significant disruption in the service provided by the application. Examples include on-line transaction processing systems such as airline reservation systems and automated teller networks in banking systems. In addition, there exist many applications for which a high degree of data availability is important, but continuous operation is not required. An example is a research and development environment, where access to a centrally-stored CAD system is often necessary to make progress on a design project. These applications and many others mandate both high performance and high availability from their storage subsystems. Redundant disk arrays are systems in which a high level of I/O performance is obtained by grouping together a large number of small disks, rather than building one lar...
Dependability Analysis of Fault-Tolerant Multiprocessor Architectures through Simulated Fault Injection (Chapter 5 and 6)
, 1993
"... Introduction Computer systems achieve fault-tolerance primarily through redundancy. Multiple versions of a software routine can be executed to overcome implementation errors in the application code. Hardware can be replicated and operated in parallel or sequentially, as a series of spares, to surviv ..."
Abstract
- Add to MetaCart
Introduction Computer systems achieve fault-tolerance primarily through redundancy. Multiple versions of a software routine can be executed to overcome implementation errors in the application code. Hardware can be replicated and operated in parallel or sequentially, as a series of spares, to survive logic faults. Redundant software is expensive to develop, and increases memory requirements and execution time. Redundant hardware is difficult to design, and adds to the cost, size, weight and power consumption of a machine. Many fault-tolerant applications, such as the control of fly-by-wire aircraft and deep space probes, have physical limitations on the amount of redundancy that can be incorporated into a system. Cost is always a consideration when adding redundancy to improve fault-tolerance. The level of redundancy needed is determined by dependability requirements and the nature of the faults and errors that can be expected to affect a system. The behavior of processors in th
RAIDframe: A Rapid Prototyping Tool for RAID Systems
, 1997
"... Redundant disk arrays provide highly-available, high-performance disk storage to a wide variety of applications. Because these applications often have distinct cost, performance, capacity, and availability requirements, researchers continue to develop new array architectures. RAIDframe was developed ..."
Abstract
- Add to MetaCart
Redundant disk arrays provide highly-available, high-performance disk storage to a wide variety of applications. Because these applications often have distinct cost, performance, capacity, and availability requirements, researchers continue to develop new array architectures. RAIDframe was developed to assist researchers in the implementation and evaluation of these new architectures. It was designed specifically to reduce the burden of implementation by restricting code changes to mapping, algorithms and other functions that are known to be specific to an array architecture. Algorithms are executed using a general mechanism which automates the recovery from device errors, such as a failed disk read. RAIDframe enables a single implementation to be evaluated in a self-contained simulator, or against real disks as either a user process or a functional device driver.

