Results 1 - 10
of
53
The Zebra striped network file system
- ACM Transactions on Computer Systems
, 1995
"... Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a single stream, which it then stripes using an approach similar to a log-structured file system. This ..."
Abstract
-
Cited by 256 (5 self)
- Add to MetaCart
Zebra is a network file system that increases throughput by striping file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a single stream, which it then stripes using an approach similar to a log-structured file system. This provides high performance for writes of small files as well as for reads and writes of large files. Zebra also writes parity information in each stripe in the style of RAID disk arrays; this increases storage costs slightly but allows the system to continue operation even while a single storage server is unavailable. A prototype implementation of Zebra, built in the Sprite operating system, provides 4-5 times the throughput of the standard Sprite file system or NFS for large files and a 15 % to 300 % improvement for writing small files. 1
Building Secure and Reliable Network Applications
, 1996
"... ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to th ..."
Abstract
-
Cited by 209 (16 self)
- Add to MetaCart
ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to the invoker, and exceptions are raised if (and only if) an error occurs. Given a completely reliable communication environment, which never loses, duplicates, or reorders messages, and given client and server processes that never fail, RPC would be trivial to solve. The sender would merely package the invocation into one or more messages, and transmit these to the server. The server would unpack the data into local variables, perform the desired operation, and send back the result (or an indication of any exception that occurred) in a reply message. The challenge, then, is created by failures. Were it not for the possibility of process and machine crashes, an RPC protocol capable of overcomi...
Using Process Groups to Implement Failure Detection in Asynchronous Environments
, 1991
"... Agreement on the membership of a group of processes in a distributed system is a basic problem that arises in a wide range of applications. Such groups occur when a set of processes co-operate to perform some task, share memory, monitor one another, subdivide a computation, and so forth. In this pap ..."
Abstract
-
Cited by 157 (15 self)
- Add to MetaCart
Agreement on the membership of a group of processes in a distributed system is a basic problem that arises in a wide range of applications. Such groups occur when a set of processes co-operate to perform some task, share memory, monitor one another, subdivide a computation, and so forth. In this paper we discuss the Group Membership Problem as it relates to failure detection in asynchronous, distributed systems. We present a rigorous, formal specification for group membership under this interpretation. We then present a solution for this problem that improves upon previous work.
Understanding the Limitations of Causally and Totally Ordered Communication
- In Proceedings of the 14th ACM Symposium on Operating Systems Principles
, 1993
"... Causally and totally ordered communication support (CATOCS) has been proposed as important to provide as part of the basic building blocks for constructing reliable distributed systems. In this paper, we identify four major limitations to CATOCS, investigate the applicability of CATOCS to several cl ..."
Abstract
-
Cited by 139 (1 self)
- Add to MetaCart
Causally and totally ordered communication support (CATOCS) has been proposed as important to provide as part of the basic building blocks for constructing reliable distributed systems. In this paper, we identify four major limitations to CATOCS, investigate the applicability of CATOCS to several classes of distributed applications in light of these limitations, and the potential impact of these facilities on communication scalability and robustness. From this investigation, we find limited merit and several potential problems in using CATOCS. The fundamental difficulty with the CATOCS is that it attempts to solve state problems at the communication level in violation of the
Implementation of the Ficus Replicated File System
- In USENIX Conference Proceedings
, 1990
"... As we approach nation-wide integration of computer systems, it is clear that file replication will play a key role, both to improve data availability in the face of failures, and to improve performance by locating data near where it will be used. We expect that future file systems will have an exten ..."
Abstract
-
Cited by 114 (23 self)
- Add to MetaCart
As we approach nation-wide integration of computer systems, it is clear that file replication will play a key role, both to improve data availability in the face of failures, and to improve performance by locating data near where it will be used. We expect that future file systems will have an extensible, modular structure in which features such as replication can be "slipped in" as a transparent layer in a stackable layered architecture. We introduce the Ficus replicated file system for NFS and show how it is layered on top of existing file systems. The Ficus file system differs from previous file replication services in that it permits update during network partition if any copy of a file is accessible. File and directory updates are automatically propagated to accessible replicas. Conflicting updates to directories are detected and automatically repaired; conflicting updates to ordinary files are detected and reported to the owner. The frequency of communications outages rendering i...
Using Smart Clients to Build Scalable Services
- In Proceedings of the 1997 USENIX Technical Conference
, 1997
"... Individual machines are no longer sufficient to handle the offered load to many Internet sites. To use multiple machines for scalable performance, load balancing, fault transparency, and backward compatibility with URL naming must be addressed. A number of approaches have been developed to provide t ..."
Abstract
-
Cited by 111 (10 self)
- Add to MetaCart
Individual machines are no longer sufficient to handle the offered load to many Internet sites. To use multiple machines for scalable performance, load balancing, fault transparency, and backward compatibility with URL naming must be addressed. A number of approaches have been developed to provide transparent access to multi-server Internet services includingHTTP redirect, DNS aliasing, Magic Routers, and Active Networks. Recently however, portable Java code and lightly loaded client machines allow the migration of certain service functionality onto the client. In this paper, we argue that in many instances, a client-side approach to providing transparent access to Internet services provides increased flexibility and performance over the existing solutions. We describe the design and implementation of Smart Clients and show how our system can be used to provide transparent access to scalable and/or highly available network services, including prototypes for: telnet, FTP, and an Internet chat application. 1
The isis project: Real experience with a fault tolerant programming system
- Operating Systems Review
, 1991
"... The ISIS Project: Real experience with a fault tolerant programming system ..."
Abstract
-
Cited by 70 (7 self)
- Add to MetaCart
The ISIS Project: Real experience with a fault tolerant programming system
Perspectives on Optimistically Replicated, Peer-to-Peer Filing
, 1997
"... This paper details and evaluates the use of optimistic replica consistency, automatic update conflict detection and repair, the peer-to-peer (as opposed to client-server) interaction model, and the stackable file system architecture in the design and construction of Ficus. The paper concludes with a ..."
Abstract
-
Cited by 61 (6 self)
- Add to MetaCart
This paper details and evaluates the use of optimistic replica consistency, automatic update conflict detection and repair, the peer-to-peer (as opposed to client-server) interaction model, and the stackable file system architecture in the design and construction of Ficus. The paper concludes with a number of lessons learned from the experience of designing, building, measuring, and living with an optimistcally replicated file system.
A Highly Available Network File Server
- In Proceedings of the Winter USENIX Conference
, 1991
"... This paper presents the design and implementation of a Highly Available Network File Server (HA-NFS). We separate the problem of network file server reliability into three different subproblems: server reliability, disk reliability, and network reliability. HA-NFS offers a different solution for eac ..."
Abstract
-
Cited by 48 (1 self)
- Add to MetaCart
This paper presents the design and implementation of a Highly Available Network File Server (HA-NFS). We separate the problem of network file server reliability into three different subproblems: server reliability, disk reliability, and network reliability. HA-NFS offers a different solution for each: dual-ported disks and impersonation are used to provide server reliability, disk mirroring can be used to provide disk reliability, and optional network replication can be used to provide network reliability. The implementation shows that HA-NFS provides high availability without the excessive resource overhead or the performance degradation that characterize traditional replication methods. Ongoing operations are not aborted during fail-over and recovery is completely transparent to applications. HA-NFS adheres to the NFS protocol standard and can be used by existing NFS clients without modification. 1 Introduction Traditional approaches for providing reliability in network file systems...
Ficus: A Very Large Scale Reliable Distributed File System
- UNIVERSITY OF CALIFORNIA, LOS ANGELES
, 1991
"... The dissertation presents the issues addressed in the design of Ficus, a large scale wide area distributed file system currently operational on a modest scale at UCLA. Key aspects of providing such a service include toleration of partial operation in virtually all areas; support for large scale, ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
The dissertation presents the issues addressed in the design of Ficus, a large scale wide area distributed file system currently operational on a modest scale at UCLA. Key aspects of providing such a service include toleration of partial operation in virtually all areas; support for large scale, optimistic data replication; and a flexible, extensible modular design. Ficus incorporates a "stackable layers" modular architecture and full support for optimistic replication. Replication is provided by a pair of layers operating in concert above a traditional filing service. A "volume" abstraction and on-the-fly volume "grafting" mechanism are used to manage the large scale file name space. The replication service uses a f...

