Results 1 - 10
of
11
Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
- IN PROC. 29TH ACM SYMPOSIUM ON THEORY OF COMPUTING (STOC
, 1997
"... We describe a family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and ..."
Abstract
-
Cited by 699 (10 self)
- Add to MetaCart
(Show Context)
We describe a family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems.
Novel Architectures for P2P Applications: the Continuous-Discrete Approach
- ACM TRANSACTIONS ON ALGORITHMS
, 2007
"... We propose a new approach for constructing P2P networks based on a dynamic decomposition of a continuous space into cells corresponding to processors. We demonstrate the power of these design rules by suggesting two new architectures, one for DHT (Distributed Hash Table) and the other for dynamic ex ..."
Abstract
-
Cited by 166 (8 self)
- Add to MetaCart
We propose a new approach for constructing P2P networks based on a dynamic decomposition of a continuous space into cells corresponding to processors. We demonstrate the power of these design rules by suggesting two new architectures, one for DHT (Distributed Hash Table) and the other for dynamic expander networks. The DHT network, which we call Distance Halving, allows logarithmic routing and load, while preserving constant degrees. Our second construction builds a network that is guaranteed to be an expander. The resulting topologies are simple to maintain and implement. Their simplicity makes it easy to modify and add protocols. We show it is possible to reduce the dilation and the load of the DHT with a small increase of the degree. We present a provably good protocol for relieving hot spots and a construction with high fault tolerance. Finally we show that, using our approach, it is possible to construct any family of constant degree graphs in a dynamic environment, though with worst parameters. Therefore we expect that more distributed data structures could be designed and implemented in a dynamic environment.
Coordinated Placement and Replacement for Large-Scale Distributed Caches
- IEEE Transactions on Knowledge and Data Engineering
, 1998
"... In a large-scale information system such as a digital library or the web, a set of distributed caches can improve their effectiveness by coordinating their data placement decisions. In this paper, we examine the design space for cooperative placement and replacement algorithms. Our main focus is on ..."
Abstract
-
Cited by 84 (8 self)
- Add to MetaCart
In a large-scale information system such as a digital library or the web, a set of distributed caches can improve their effectiveness by coordinating their data placement decisions. In this paper, we examine the design space for cooperative placement and replacement algorithms. Our main focus is on the placement algorithms, which attempt to solve the following problem: given a set of caches, the network distances between caches, and predictions of the access rates from each cache to a set of objects, determine where to place each object in order to minimize the average access cost. Replacement algorithms also attempt to minimize access cost, but they work by selecting which objects to evict when a cache miss occurs. Using simulation, we examine three practical cooperative placement algorithms including one that is provably close to optimal, and we compare these algorithms to the optimal placement algorithm and several cooperative and non-cooperative replacement algorithms. We draw fiv...
Exploiting Locality for Data Management in Systems of Limited Bandwidth
- IN PROCEEDINGS OF THE 38TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 1997
"... This paper deals with data management in computer systems in which the computing nodes are connected by a relatively sparse network. We consider the problem of placing and accessing a set of shared objects that are read and written from the nodes in the network. These objects are, e.g., global varia ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
This paper deals with data management in computer systems in which the computing nodes are connected by a relatively sparse network. We consider the problem of placing and accessing a set of shared objects that are read and written from the nodes in the network. These objects are, e.g., global variables in a parallel program, pages or cache lines in a virtual shared memory system, shared files in a distributed file system, or pages in the World Wide Web. A data management strategy consists of a placement strategy that maps the objects (possibly dynamically and with redundancy) to the nodes, and an access strategy that describes how reads and writes are handled by the system (including the routing). We investigate static and dynamic data management strategies. In the static model, we assume that we are given an application for which the rates of read and write accesses for all node--object pairs are known. The goal is to calculate a static placement of the objects to the nodes in the ne...
Relieving Hot Spots on the World Wide Web
- IN PROCEEDINGS OF THE 29TH ANNUAL ACM SYMPOSIUM ON THE THEORY OF COMPUTING
, 1997
"... We describe a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Hot spots are web sites that swamped by a large number of requests for their pages. Our protocols are particularly designed for use with very large ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We describe a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Hot spots are web sites that swamped by a large number of requests for their pages. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows.
Semantics of Caching with SPOCA: A Stateless, Proportional, Optimally-Consistent Addressing Algorithm
"... A key measure for the success of a Content Delivery Network is controlling cost of the infrastructure required to serve content to its end users. In this paper, we take a closer look at how Yahoo! efficiently serves millions of videos from its video library. A significant portion of this video libra ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
A key measure for the success of a Content Delivery Network is controlling cost of the infrastructure required to serve content to its end users. In this paper, we take a closer look at how Yahoo! efficiently serves millions of videos from its video library. A significant portion of this video library consists of a large number of relatively unpopular user-generated content and a small set of popular videos that changes over time. Yahoo!’s initial architecture to handle the distribution of videos to Internet clients used shared storage to hold the videos and a hardware load balancer to handle failures and balance the load across the front-end server that did the actual transfers to the clients. The front-end servers used both their memory and hard drives as caches for the content they served. We found that this simple architecture did not use the front-end server caches effectively. We were able to improve our front-end caching while still being able to tolerate faults, gracefully handle the addition and removal of servers, and take advantage of geographic locality when serving content. We describe our solution, called SPOCA (Stateless, Proportional, Optimally-Consistent Addressing), which reduce disk cache misses from 5 % to less than 1%, and increase memory cache hits from 45 % to 80 % and thereby resulting in the overall cache hits from 95 % to 99.6%. Unlike other consistent addressing mechanisms, SPOCA facilitates nearly-optimal load balancing. 1
Zatara, the Plug-in-able Eventually Consistent Distributed Database
"... With the proliferation of the computer Cloud, new software delivery methods were created. In order to build software to fit into one of these models, a scalable, easy to deploy storage tier is required. Distributed, non-SQL databases use multiple techniques to distribute information, guarantee data ..."
Abstract
- Add to MetaCart
With the proliferation of the computer Cloud, new software delivery methods were created. In order to build software to fit into one of these models, a scalable, easy to deploy storage tier is required. Distributed, non-SQL databases use multiple techniques to distribute information, guarantee data consistency and grow, but unfortunately most developments were designed with a single class of applications in mind, which means that they bring many constraints for developers. Existent solutions range from simple key-value databases to more complex approaches usually developed for data indexing. Our approach is a multi-purpose, distributed database engine that features an abstract query interface and plug-in-able internal data structures. The database, called ZATARA, is tested on an Amazon EC2 infrastructure with 196 nodes. In this environment it is able to deliver more than 20 million transactions per second, scaling almost linearly with the number of nodes. From the performance point of view, these results demonstrate our initial assumptions that it is not necessary to expose a particular data structure in order for the database to scale. Although Zatara cannot replace SQL databases in all deployments, it provides sufficient flexibility to make it a viable choice for most applications that have to scale indefinitely. Also, it can be used as a caching system in order to reduce load on the SQL tier, in these scenario being trivial to add it to an existing application.