Results 1 - 10
of
97
Tapestry: A Resilient Global-scale Overlay for Service Deployment
- IEEE Journal on Selected Areas in Communications
, 2004
"... We present Tapestry, a peer-to-peer overlay routing infrastructure offering efficient, scalable, locationindependent routing of messages directly to nearby copies of an object or service using only localized resources. Tapestry supports a generic Decentralized Object Location and Routing (DOLR) API ..."
Abstract
-
Cited by 374 (13 self)
- Add to MetaCart
We present Tapestry, a peer-to-peer overlay routing infrastructure offering efficient, scalable, locationindependent routing of messages directly to nearby copies of an object or service using only localized resources. Tapestry supports a generic Decentralized Object Location and Routing (DOLR) API using a self-repairing, softstate based routing layer. This paper presents the Tapestry architecture, algorithms, and implementation. It explores the behavior of a Tapestry deployment on PlanetLab, a global testbed of approximately 100 machines. Experimental results show that Tapestry exhibits stable behavior and performance as an overlay, despite the instability of the underlying network layers. Several widely-distributed applications have been implemented on Tapestry, illustrating its utility as a deployment infrastructure.
OpenDHT: A Public DHT Service and Its Uses
, 2005
"... Large-scale distributed systems are hard to deploy, and distributed hash tables (DHTs) are no exception. To lower the barriers facing DHT-based applications, we have created a public DHT service called OpenDHT. Designing a DHT that can be widely shared, both among mutually untrusting clients and amo ..."
Abstract
-
Cited by 159 (16 self)
- Add to MetaCart
Large-scale distributed systems are hard to deploy, and distributed hash tables (DHTs) are no exception. To lower the barriers facing DHT-based applications, we have created a public DHT service called OpenDHT. Designing a DHT that can be widely shared, both among mutually untrusting clients and among a variety of applications, poses two distinct challenges. First, there must be adequate control over storage allocation so that greedy or malicious clients do not use more than their fair share. Second, the interface to the DHT should make it easy to write simple clients, yet be sufficiently general to meet a broad spectrum of application requirements. In this paper we describe our solutions to these design challenges. We also report our early deployment experience with OpenDHT and describe the variety of applications already using the system.
Secure Untrusted Data Repository (SUNDR)
"... We have implemented a secure network file system called SUNDR that guarantees the integrity of data even when malicious parties control the server. SUNDR splits storage functionality between two untrusted components, a block store and a consistency server. The block store holds all file data and mos ..."
Abstract
-
Cited by 111 (2 self)
- Add to MetaCart
We have implemented a secure network file system called SUNDR that guarantees the integrity of data even when malicious parties control the server. SUNDR splits storage functionality between two untrusted components, a block store and a consistency server. The block store holds all file data and most metadata. Without interpreting metadata, it presents a simple interface for clients to store variable-sized data blocks and later retrieve them by cryptographic hash.
Shark: Scaling file servers via cooperative caching
- In Proc NSDI
, 2005
"... Abstract Network file systems offer a powerful, transparent inter-face for accessing remote data. Unfortunately, in current ..."
Abstract
-
Cited by 72 (4 self)
- Add to MetaCart
Abstract Network file systems offer a powerful, transparent inter-face for accessing remote data. Unfortunately, in current
Ursa Minor: versatile cluster-based storage
, 2005
"... No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched to access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor is a cluster-based storage system that allows data-specific selection of, and on-li ..."
Abstract
-
Cited by 56 (30 self)
- Add to MetaCart
No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched to access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor is a cluster-based storage system that allows data-specific selection of, and on-line changes to, encoding schemes and fault models. Thus, different data types can share a scalable storage infrastructure and still enjoy specialized choices, rather than suffering from "one size fits all." Experiments with Ursa Minor show performance benefits of 2--3 when using specialized choices as opposed to a single, more general, configuration. Experiments also show that a single cluster supporting multiple workloads simultaneously is much more efficient when the choices are specialized for each distribution rather than forced to use a "one size fits all" configuration. When using the specialized distributions, aggregate cluster throughput nearly doubled.
Minimizing Churn in Distributed Systems
- IN PROC. ACM SIGCOMM
, 2006
"... A pervasive requirement of distributed systems is to deal with churn -- change in the set of participating nodes due to joins, graceful leaves, and failures. A high churn rate can increase costs or decrease service quality. This paper studies how to reduce churn by selecting which subset of a set of ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
A pervasive requirement of distributed systems is to deal with churn -- change in the set of participating nodes due to joins, graceful leaves, and failures. A high churn rate can increase costs or decrease service quality. This paper studies how to reduce churn by selecting which subset of a set of available nodes to use. First,
Replication under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution
- In Proceedings of the 18th International Parallel & Distributed Processing Symposium (IPDPS 2004), Santa Fe, NM
, 2004
"... Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Repl ..."
Abstract
-
Cited by 43 (13 self)
- Add to MetaCart
Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers are removed, and all variants guarantee that no two replicas of a particular object are ever placed on the same server. Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.
PRACTI replication
- IN PROC NSDI
, 2006
"... We present PRACTI, a new approach for large-scale replication. PRACTI systems can replicate or cache any subset of data on any node (Partial Replication), provide a broad range of consistency guarantees (Arbitrary Consistency), and permit any node to send information to any other node (Topology Inde ..."
Abstract
-
Cited by 41 (14 self)
- Add to MetaCart
We present PRACTI, a new approach for large-scale replication. PRACTI systems can replicate or cache any subset of data on any node (Partial Replication), provide a broad range of consistency guarantees (Arbitrary Consistency), and permit any node to send information to any other node (Topology Independence). A PRACTI architecture yields two significant advantages. First, by providing all three PRACTI properties, it enables better trade-offs than existing mechanisms that support at most two of the three desirable properties. The PRACTI approach thus exposes new points in the design space for replication systems. Second, the flexibility of PRACTI protocols simplifies the design of replication systems by allowing a single architecture to subsume a broad range of existing systems and to reduce development costs for new ones. To illustrate both advantages, we use our PRACTI prototype to emulate existing server replication, client-server, and object replication systems and to implement novel policies that improve performance for mobile users, web edge servers, and grid computing by as much as an order of magnitude.
JuxMem: An adaptive supportive platform for data sharing on the grid
- In Proceedings Workshop on Adaptive Grid Middleware (AGRIDM 2003
, 2003
"... We address the challenge of managing large amounts of numerical data within computing grids consisting of a federation of clusters. We claim that storing, accessing, updating and sharing such data should be considered by applications as an external service. We propose a hierarchical architecture for ..."
Abstract
-
Cited by 38 (16 self)
- Add to MetaCart
We address the challenge of managing large amounts of numerical data within computing grids consisting of a federation of clusters. We claim that storing, accessing, updating and sharing such data should be considered by applications as an external service. We propose a hierarchical architecture for this service, based on a peer-to-peer approach. This architecture is illustrated through a software platform called JUXMEM (for Juxtaposed Memory), which provides transparent access to mutable data, while enhancing data persistence in a dynamic environment. Managing the volatility of storage resources is specially emphasized. As a proof of concept, we describe a prototype implementation on top of the JXTA peer-to-peer framework, and we report on a preliminary experimental evaluation. 1.
Network Coding for Distributed Storage Systems
- In Proc. of IEEE INFOCOM
, 2007
"... Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, requires less redundancy than simple replication for the same level of reliability. However, since fragments must be periodically replaced as nodes fail, a key question is how to generate encoded fragments in a distributed way while transferring as little data as possible across the network. For an erasure coded system, a common practice to repair from a node failure is for a new node to download subsets of data stored at a number of surviving nodes, reconstruct a lost coded block using the downloaded data, and store it at the new node. We show that this procedure is sub-optimal. We introduce the notion of regenerating codes, which allow a new node to download functions of the stored data from the surviving nodes. We show that regenerating codes can significantly reduce the repair bandwidth. Further, we show that there is a fundamental tradeoff between storage and repair bandwidth which we theoretically characterize using flow arguments on an appropriately constructed graph. By invoking constructive results in network coding, we introduce regenerating codes that can achieve any point in this optimal tradeoff. I.

