Results 1 - 10
of
49
The Case for Resilient Overlay Networks
- in Proceedings of the 8th Annual Workshop on Hot Topics in Operating Systems (HotOSVIII
, 2001
"... In this paper, we motivate and describe the architecture of Resilient Overlay Networks (RON), an application-level packet forwarding service that gives end-hosts and applications the ability to take advantage of network paths that traditional Internet routing cannot make use of, thereby improving th ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we motivate and describe the architecture of Resilient Overlay Networks (RON), an application-level packet forwarding service that gives end-hosts and applications the ability to take advantage of network paths that traditional Internet routing cannot make use of, thereby improving their end-to-end reliability and performance. A RON system consists of a per-host forwarding and routing system; programs to measure the quality of paths between participating hosts; and mechanisms for interpreting this measured data and making routing decisions based upon that interpretation. RONs are usable as a purely user-level library system, with kernel support for packet encapsulation, or as a router to overlay entire leaf networks. We explain the reasons for the architectural design of RON, and argue that end-host controlled Resilient Overlay Networks provide a good framework for distributed applications to transmit data with greater robustness and higher performance over the wide-area...
The ϕ accrual failure detector
- RR IS-RR-2004-010, Japan Advanced Institute of Science and Technology
, 2004
"... Traditionally, failure detectors have considered a binary model whereby a given process can be either trusted or suspected. This paper defines a family of failure detectors, called accrual failure detectors, that revisits this interaction model. Accrual failure detectors associate to each process a ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
(Show Context)
Traditionally, failure detectors have considered a binary model whereby a given process can be either trusted or suspected. This paper defines a family of failure detectors, called accrual failure detectors, that revisits this interaction model. Accrual failure detectors associate to each process a real value representing a suspicion level. An important advantage of accrual failure detectors over binary ones is to allow distributed applications to trigger different actions depending on the suspicion level. For instance, an application can take precautionary measures when the suspicion level reaches a given level, and then take more drastic actions after it raises above a second (much higher) level. The paper defines accrual failure detectors and their basic properties. Four classes of accrual failure detectors are discussed, each of which is proved equivalent to a class of binary unreliable failure detectors (P, S, ♦P, and ♦S). 1
A Cooperative File System
- MASTER’S THESIS, MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT
, 2001
"... The Cooperative File System (CFS) is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFS does this with a completely decentralized architecture that can scale to large systems. CFS servers p ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
The Cooperative File System (CFS) is a new peer-to-peer read-only storage system that provides provable guarantees for the efficiency, robustness, and load-balance of file storage and retrieval. CFS does this with a completely decentralized architecture that can scale to large systems. CFS servers provide a distributed hash table (DHash) for block storage. CFS clients interpret DHash blocks as a file system. DHash distributes and caches blocks at a fine granularity to achieve load balance, uses replication for robustness, and decreases latency with server selection. DHash finds blocks using the Chord location protocol, which operates in time logarithmic in the number of servers and requires logarithmic state at each node. CFS is implemented using the SFS file system toolkit and runs on many UNIX operating systems including Linux, OpenBSD, and FreeBSD. Experience on a globally deployed prototype shows that CFS delivers data to clients as fast as FTP. Controlled tests show that CFS is able to route queries in a scalable way. For example, in experiments with a system of 4,096 servers, looking up a block of data involves contacting only seven servers. In general, a logarithmic number of servers must be contacted to route a query. Servers are also able to join and leave the system efficiently. Tests demonstrate nearly perfect robustness and unimpaired performance even when as many as half the servers fail.
Census: Location-Aware Membership Management for Large-Scale Distributed Systems
"... We present Census, a platform for building large-scale distributed applications. Census provides a membership service and a multicast mechanism. The membership service provides every node with a consistent view of the system membership, which may be global or partitioned into location-based regions. ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
We present Census, a platform for building large-scale distributed applications. Census provides a membership service and a multicast mechanism. The membership service provides every node with a consistent view of the system membership, which may be global or partitioned into location-based regions. Census distributes membership updates with low overhead, propagates changes promptly, and is resilient to both crashes and Byzantine failures. We believe that Census is the first system to provide a consistent membership abstraction at very large scale, greatly simplifying the design of applications built atop large deployments such as multi-site data centers. Census builds on a novel multicast mechanism that is closely integrated with the membership service. It organizes nodes into a reliable overlay composed of multiple distribution trees, using network coordinates to minimize latency. Unlike other multicast systems, it avoids the cost of using distributed algorithms to construct and maintain trees. Instead, each node independently produces the same trees from the consistent membership view. Census uses this multicast mechanism to distribute membership updates, along with application-provided messages. We evaluate the platform under simulation and on a real-world deployment on PlanetLab. We find that it imposes minimal bandwidth overhead, is able to react quickly to node failures and changes in the system membership, and can scale to substantial size. 1
Group Communication based on Standard Interfaces
- in 2nd IEEE Intl. Symp. on Network Computing and Applications (NCA-03
, 2003
"... While group communication system have been proposed for some time, they are still not used much in actual systems. We believe that one reason for this is the lack of standardisation of group communication system interfaces. The paper proposes an architecture, using the standard decomposition into se ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
(Show Context)
While group communication system have been proposed for some time, they are still not used much in actual systems. We believe that one reason for this is the lack of standardisation of group communication system interfaces. The paper proposes an architecture, using the standard decomposition into services, were services are based on standard interfaces: both interactions between services and interactions with the application use existing, open standards. A decomposition of the group communication into services is presented, along with a description of applicable standards. As an example, a group membership service based on the LDAP standard is discussed. 1.
QuickSilver Scalable Multicast
, 2006
"... Our work is motivated by a platform we’re building to support a new style of distributed programming, in which users drag and drop live components into live documents, often without needing to write new code. The capability requires a multicast layer that scales in dimensions not previously explored ..."
Abstract
-
Cited by 12 (9 self)
- Add to MetaCart
(Show Context)
Our work is motivated by a platform we’re building to support a new style of distributed programming, in which users drag and drop live components into live documents, often without needing to write new code. The capability requires a multicast layer that scales in dimensions not previously explored. In particular, live documents generate large numbers of multicast groups with irregular overlap. Traditional reliable multicast protocols were conceived for a single group at a time, and multi-group configurations can trigger costly resource contention. Quicksilver Scalable Multicast 1 (QSM) solves these problems using two kinds of mechanisms. First, we introduce several techniques to aggregate traffic when groups overlap. But we also identify a previously unnoticed linkage between memory footprint and CPU consumption, motivating a second class of techniques that minimize memory use and CPU loads. The resulting system is fast, scales well, and is stable under stress. Moreover, our techniques should be applicable in other high-performance distributed systems. 1.
D.: QuickSilver Scalable Multicast (QSM
- In: NCA ’08: Proceedings of the 2008 Seventh IEEE International Symposium on Network Computing and Applications
, 2008
"... QSM is a multicast engine designed to support a style of distributed programming in which application objects are replicated among clients and updated via multicast. The model requires platforms that scale in dimensions previously unexplored; in particular, to large numbers of multicast groups. Prio ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
QSM is a multicast engine designed to support a style of distributed programming in which application objects are replicated among clients and updated via multicast. The model requires platforms that scale in dimensions previously unexplored; in particular, to large numbers of multicast groups. Prior systems weren’t optimized for such scenarios and can’t take advantage of regular group overlap patterns, a key feature of our application domain. Furthermore, little is known about performance and scalability of such systems in modern managed environments. We shed light on these issues and offer architectural insights based on our experience building QSM. 1.
Using leader-based communication to improve the scalability of single-round group membership algorithms
- In International Parallel and Distributed Processing Symposium
, 2005
"... Sigma, the first single-round group membership (GM) algorithm, was recently introduced and demonstrated to operate consistently with theoretical expectations in a simulated WAN environment. Sigma achieved similar quality of membership configurations as existing algorithms but required fewer message ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Sigma, the first single-round group membership (GM) algorithm, was recently introduced and demonstrated to operate consistently with theoretical expectations in a simulated WAN environment. Sigma achieved similar quality of membership configurations as existing algorithms but required fewer message exchange rounds. We now consider Sigma in terms of scalability. Sigma involves all-to-all (A2A) type of communication among members. A2A protocols have been shown to perform worse than leader-based (LB) protocols in certain networks, due to greater message overhead and higher likelihood of message loss. Thus, although LB protocols often involve additional communication steps, they can be more efficient in practice, particularly in fault-prone networks with large numbers of participating nodes. In this paper, we present Leader-Based Sigma, which transforms the original all-to-all version into a more scalable centralized communication scheme, and discuss the rounds vs. messages tradeoff involved in optimizing GM algorithms for deployment in large-scale, fault-prone dynamic network environments. 1.
A Virtually Synchronous Group Multicast Algorithm for WANs: Formal Approach
- SIAM Journal on Computing
, 2002
"... This paper presents a formal design for a novel group communication service targeted for wide-area networks (WANs). The service provides Virtual Synchrony semantics. Such semantics facilitate the design of fault tolerant distributed applications. The presented design is more suitable for WANs than p ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper presents a formal design for a novel group communication service targeted for wide-area networks (WANs). The service provides Virtual Synchrony semantics. Such semantics facilitate the design of fault tolerant distributed applications. The presented design is more suitable for WANs than previously suggested ones. In particular, it features the first algorithm to achieve Virtual Synchrony semantics in a single communication round. The design also employs a scalable WAN-oriented architecture: it e#ectively decouples the main two components of Virtually Synchronous group communication --- group membership and reliable group multicast.
Extensible Architecture for High-Performance, Scalable, Reliable PublishSubscribe Eventing and Notification
- In submission
, 2006
"... Existing Web service notification and eventing standards are useful in many applications, but they have serious limitations that make them ill-suited for large-scale deployments, or as a middleware or a component-integration technology in today’s data centers. For example, it is not possible to use ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Existing Web service notification and eventing standards are useful in many applications, but they have serious limitations that make them ill-suited for large-scale deployments, or as a middleware or a component-integration technology in today’s data centers. For example, it is not possible to use IP multicast, or for recipients to forward messages to others, scalable notification trees must be setup manually, and no end-to-end security, reliability, or QoS guarantees can be provided. We propose an architecture that is free of such limitations and that may serve as a basis for extending or complementing the existing standards. The approach emerges from our work on QuickSilver, a new, extremely modular and extensible platform for high-performance, scalable, reliable eventing. Keywords: architecture; eventing; extensible; multicast; notification; publish-subscribe; reliable; scalable