Results 1 - 10
of
61
A Scalable Content-Addressable Network
- IN PROC. ACM SIGCOMM 2001
, 2001
"... Hash tables – which map “keys ” onto “values” – are an essential building block in modern software systems. We believe a similar functionality would be equally valuable to large distributed systems. In this paper, we introduce the concept of a Content-Addressable Network (CAN) as a distributed infra ..."
Abstract
-
Cited by 2353 (29 self)
- Add to MetaCart
Hash tables – which map “keys ” onto “values” – are an essential building block in modern software systems. We believe a similar functionality would be equally valuable to large distributed systems. In this paper, we introduce the concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales. The CAN is scalable, fault-tolerant and completely self-organizing, and we demonstrate its scalability, robustness and low-latency properties through simulation.
The physiology of the grid: An open grid services architecture for distributed systems integration
, 2002
"... In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic “virtual organizations ” formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be t ..."
Abstract
-
Cited by 973 (28 self)
- Add to MetaCart
In both e-business and e-science, we often need to integrate services across distributed, heterogeneous, dynamic “virtual organizations ” formed from the disparate resources within a single enterprise and/or from external resource sharing and service provider relationships. This integration can be technically challenging because of the need to achieve various qualities of service when running on top of different native platforms. We present an Open Grid Services Architecture that addresses these challenges. Building on concepts and technologies from the Grid and Web services communities, this architecture defines a uniform exposed service semantics (the Grid service); defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provides location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities. The Open Grid Services Architecture also defines, in terms of Web Services Description Language (WSDL) interfaces and associated conventions, mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification. Service bindings can support reliable invocation, authentication, authorization, and delegation, if required. Our presentation complements an earlier foundational article, “The Anatomy of the Grid, ” by describing how Grid mechanisms can implement a service-oriented architecture, explaining how Grid functionality can be incorporated into a Web services framework, and illustrating how our architecture can be applied within commercial computing as a basis for distributed system integration—within and across organizational domains. This is a DRAFT document and continues to be revised. The latest version can be found at
The design and implementation of an intentional naming system
- 17TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '99) PUBLISHED AS OPERATING SYSTEMS REVIEW, 34(5):186--201, DEC. 1999
, 1999
"... This paper presents the design and implementation of the Intentional Naming System (INS), a resource discovery and service location system for dynamic and mobile networks of devices and computers. Such environments require a naming system that is (i) expressive, to describe and make requests based o ..."
Abstract
-
Cited by 417 (10 self)
- Add to MetaCart
This paper presents the design and implementation of the Intentional Naming System (INS), a resource discovery and service location system for dynamic and mobile networks of devices and computers. Such environments require a naming system that is (i) expressive, to describe and make requests based on specific properties of services, (ii) responsive, to track changes due to mobility and performance, (iii) robust, to handle failures, and (iv) easily configurable. INS uses a simple language based on attributes and values for its names. Applications use the language to describe what they are looking for (i.e., their intent), not where to find things (i.e., not hostnames). INS implements a late binding mechanism that integrates name resolution and message routing, enabling clients to continue communicating with end-nodes even if the name-to-address mappings change while a session is in progress. INS resolvers self-configure to form an application-level overlay network, which they use to discover new services, perform late binding, and maintain weak consistency of names using soft-state name exchanges and updates. We analyze the performance of the INS algorithms and protocols, present measurements of a Java-based implementation, and describe three applications we have implemented that demonstrate the feasibility and utility of INS.
Giggle: A Framework for Constructing Scalable Replica Location Services
, 2002
"... In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location ser ..."
Abstract
-
Cited by 122 (36 self)
- Add to MetaCart
In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location service (RLS) as a system that maintains and provides access to information about the physical locations of copies. An RLS typically functions as one component of a data grid architecture. This paper makes the following contributions. First, we characterize RLS requirements. Next, we describe a parameterized architectural framework, which we name Giggle (for GIGa-scale Global Location Engine), within which a wide range of RLSs can be defined. We define several concrete instantiations of this framework with different performance characteristics. Finally, we present initial performance results for an RLS prototype, demonstrating that RLS systems can be constructed that meet performance goals.
Fine-Grained Failover Using Connection Migration
, 2001
"... This paper presents a set of techniques for providing fine-grained failover of long-running connections across a distributed collection of replica servers, and is especially useful for fault-tolerant and load-balanced delivery of streaming media and telephony sessions. Our system achieves connection ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
This paper presents a set of techniques for providing fine-grained failover of long-running connections across a distributed collection of replica servers, and is especially useful for fault-tolerant and load-balanced delivery of streaming media and telephony sessions. Our system achieves connection-level failover across both local- and wide-area server replication, without requiring a frontend transport- or application-layer switch. Our approach uses recently proposed end-to-end "connection migration" mechanisms for transport protocols such as TCP, combined with a soft-state session synchronization protocol between replica servers. The end
Exactly-once Delivery in a Content-based Publish-Subscribe System
- DSN
, 2002
"... This paper presents a general knowledge model for propagating information in a content-based publish-subscribe system. The model is used to derive an efficient and scalable protocol for exactly-once delivery to large numbers (tens of thousands per broker) of content-based subscribers in either publi ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
This paper presents a general knowledge model for propagating information in a content-based publish-subscribe system. The model is used to derive an efficient and scalable protocol for exactly-once delivery to large numbers (tens of thousands per broker) of content-based subscribers in either publisher order or uniform total order. Our protocol allows intermediate content filtering at each hop, but requires persistent storage only at the publishing site. It is tolerant of message drops, message reorderings, node failures, and link failures, and maintains only "soft" state at intermediate nodes. We evaluate the performance of our implementation both under failure-free conditions and with fault injection.
TCP Veno: TCP Enhancement for Transmission Over Wireless Access Networks
- IEEE Journal on Selected Areas in Communications
, 2003
"... Wireless access networks in the form of wireless local area networks, home networks, and cellular networks are becoming an integral part of the Internet. Unlike wired networks, random packet loss due to bit errors is not negligible in wireless networks, and this causes significant performance degrad ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
Wireless access networks in the form of wireless local area networks, home networks, and cellular networks are becoming an integral part of the Internet. Unlike wired networks, random packet loss due to bit errors is not negligible in wireless networks, and this causes significant performance degradation of transmission control protocol (TCP). We propose and study a novel end-to-end congestion control mechanism called TCP Veno that is simple and effective for dealing with random packet loss. A key ingredient of Veno is that it monitors the network congestion level and uses that information to decide whether packet losses are likely to be due to congestion or random bit errors. Specifically: 1) it refines the multiplicative decrease algorithm of TCP Reno---the most widely deployed TCP version in practice---by adjusting the slow-start threshold according to the perceived network congestion level rather than a fixed drop factor and 2) it refines the linear increase algorithm so that the connection can stay longer in an operating region in which the network bandwidth is fully utilized. Based on extensive network testbed experiments and live Internet measurements, we show that Veno can achieve significant throughput improvements without adversely affecting other concurrent TCP connections, including other concurrent Reno connections. In typical wireless access networks with 1% random packet loss rate, throughput improvement of up to 80% can be demonstrated. A salient feature of Veno is that it modifies only the sender-side protocol of Reno without changing the receiver-side protocol stack.
An Architecture for Secure Wide-Area Service Discovery
, 2002
"... This paper presents the architecture and implementation of a secure wide-area Service Discovery Service (SDS). Service providers use the SDS to advertise descriptions of available or already running services, while clients use the SDS to compose complex queries for locating these services. Service ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
This paper presents the architecture and implementation of a secure wide-area Service Discovery Service (SDS). Service providers use the SDS to advertise descriptions of available or already running services, while clients use the SDS to compose complex queries for locating these services. Service descriptions and queries use the eXtensible Markup Language (XML) to encode such factors as cost, performance, location, and device- or service-specific capabilities. The SDS provides a fault-tolerant, incrementally scalable service for locating services in the wide-area. Security is a core component of the SDS: communications are both encrypted and authenticated where necessary, and the system uses a hybrid access control list and capability system to control access to service information. Wide-area query routing is also a core component of the SDS: all information in the system is potentially reachable by all clients
Improving Availability with Recursive Micro-Reboots: A Soft-State System Case Study
, 2003
"... Even after decades of software engineering research, complex computer systems still fail. This paper makes the case for increasing research emphasis on dependability and, specifically, on improving availability by reducing time-to-recover. All software fails at some point, so systems must be able to ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
Even after decades of software engineering research, complex computer systems still fail. This paper makes the case for increasing research emphasis on dependability and, specifically, on improving availability by reducing time-to-recover. All software fails at some point, so systems must be able to recover from failures. Recovery itself can fail too, so systems must know how to intelligently retry their recovery. We present here a recursive approach, in which a minimal subset of components is recovered first
A Comparison of Hard-state and Soft-state Signaling Protocols
- In: Proc. of SIGCOMM 2003
, 2003
"... One of the key infrastructure components in all telecommunication networks, ranging from the telephone network, to VC-oriented data networks, to the Internet, is its signaling system. Two broad approaches towards signaling can be identified: so-called hard-state and soft-state approaches. Despite th ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
One of the key infrastructure components in all telecommunication networks, ranging from the telephone network, to VC-oriented data networks, to the Internet, is its signaling system. Two broad approaches towards signaling can be identified: so-called hard-state and soft-state approaches. Despite the fundamental importance of signaling, our understanding of these approaches- their pros and cons and the circumstances in which they might best be employed- is mostly anecdotal (and occasionally religious). In this paper, we compare and contrast a variety of signaling approaches ranging from a “pure ” soft state, to soft-state approaches augmented with explicit state removal and/or reliable signaling, to a “pure ” hard state approach. We develop an analytic model that allows us to quantify state inconsistency in single- and multiple-hop signaling scenarios, and the “cost ” (both in terms of signaling overhead, and application-specific costs resulting from state inconsistency) associated with a given signaling approach and its parameters (e.g., state refresh and removal timers). Among the class of soft-state approaches, we find that a soft-state approach coupled with explicit removal substantially improves the degree of state consistency while introducing little additional signaling message overhead. The addition of reliable explicit setup/update/removal allows the soft-state approach to achieve comparable (and sometimes better) consistency than that of the hard-state approach. I.

