Results 1  10
of
14
Approximation Algorithms for Data Placement in Arbitrary Networks
 in Proceedings of the 12th Annual ACMSIAM Symposium on Discrete Algorithms
, 2001
"... Abstract We develop approximation algorithms for the problem of placing replicated data in arbitrary networks, where the nodes may both issue requests for data objects and have capacity for storing data objects, so as to minimize the average dataaccess cost. We introduce the data placement problem ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
Abstract We develop approximation algorithms for the problem of placing replicated data in arbitrary networks, where the nodes may both issue requests for data objects and have capacity for storing data objects, so as to minimize the average dataaccess cost. We introduce the data placement problem tomodel this problem. We have a set of caches F, a set of clients D, and a set of data objects O. Each cache i can store at most ui data objects. Each client j 2 D has demand dj for a specific data object o(j) 2 O and has to be assigned to a cache that stores that object. Storing an object o in cache i incurs astorage cost of f oi, and assigning client j to cache i incurs an access cost of djcij. The goal is to find aplacement of the data objects to caches respecting the capacity constraints, and an assignment of clients
A Framework for Evaluating Replica Placement Algorithms
, 2002
"... This paper introduces a framework for evaluating replica placement algorithms (RPA) for content delivery networks (CDN) as well as RPAs from other fields that might be applicable to current or future CDNs. First, the framework classifies and qualitatively compares RPAs using a generic set of primiti ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
This paper introduces a framework for evaluating replica placement algorithms (RPA) for content delivery networks (CDN) as well as RPAs from other fields that might be applicable to current or future CDNs. First, the framework classifies and qualitatively compares RPAs using a generic set of primitives that capture problem definitions and heuristics. Second, it provides estimates for the decision times of RPAs using an analytic model. To achieve accuracy, the model takes into account disk accesses and message sizes, in addition to computational complexity and message numbers that have been considered traditionally. Third, it uses the "goodness" of produced placements to compare RPAs even when they have different problem definitions. Based on these evaluations, we identify open issues and potential areas for future research.
Approximation Algorithms for Data Management in Networks
, 2001
"... This paper deals with static data management in computer systems connected by networks. A basic functionality in these systems is the interactive ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
This paper deals with static data management in computer systems connected by networks. A basic functionality in these systems is the interactive
A Data Tracking Scheme for General Networks
, 2001
"... Consider an arbitrary distributed network in which large numbers of objects are continuously being created, replicated, and destroyed. A basic problem arising in such an environment is that of organizing a distributed directory service for locating object copies. In this paper, we present a new data ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Consider an arbitrary distributed network in which large numbers of objects are continuously being created, replicated, and destroyed. A basic problem arising in such an environment is that of organizing a distributed directory service for locating object copies. In this paper, we present a new data tracking scheme for locating nearby copies of objects in arbitrary distributed environments. Our tracking scheme supports ecient accesses to data objects while keeping the local memory overhead low. In particular, our tracking scheme achieves an expected polylog(n) approximation in the cost of any access operation, for an arbitrary network. The memory overhead incurred by our scheme is O(polylog(n)) times the maximum number of objects stored at any node, with high probability. We also show that our tracking scheme adapts well to dynamic changes in the network.
Web Caching using Access Statistics
 PROCEEDINGS OF THE 12TH ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 2001
"... We consider the problem of caching web pages with the objective of minimizing latency of access. Demands for web domains/pages are computed using access statistics; the frequency with which these statistics change is considerably longer than the frequency of page requests. We model caches as being c ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
We consider the problem of caching web pages with the objective of minimizing latency of access. Demands for web domains/pages are computed using access statistics; the frequency with which these statistics change is considerably longer than the frequency of page requests. We model caches as being constrained by total size and total number of ports: each cache can handle only a limited request rate and can store only a limited number of domains (e.g. modelling bounded update trac). When the caches have xed locations, we present a constant factor approximation to the optimum average latency while exceeding capacity constraints by a logarithmic factor. We demonstrate improved results in the special case where no replication of pages is allowed. In the alternate model where we are allowed to place our own caches in the network for a cost, we produce a constant approximation to the weighted sum of cost and average latency. Finally, we consider several other variants of the problem which m...
A Unified Framework for Evaluating Replica Placement Algorithms
, 2002
"... The placement of data to maximize the performance and minimize the cost of a computing system is an optimization problem that has been studied extensively in several fields, including distributed databases, storage systems and, more recently, content delivery networks. However, little has been done ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
The placement of data to maximize the performance and minimize the cost of a computing system is an optimization problem that has been studied extensively in several fields, including distributed databases, storage systems and, more recently, content delivery networks. However, little has been done to compare the various approaches and their applicability to different systems.
Page migration in dynamic networks
, 2005
"... In the last couple of decades, network connected systems have gradually replaced centralized parallel computing machines. To provide smooth operation of network applications, the underlying system has to provide socalled basic services. One of the most crucial services is to provide a transparent a ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
In the last couple of decades, network connected systems have gradually replaced centralized parallel computing machines. To provide smooth operation of network applications, the underlying system has to provide socalled basic services. One of the most crucial services is to provide a transparent access to data like
Data management in hierarchical bus networks
 Proc. ACM Symp. Parallel Algorithms and Architectures
, 2000
"... A hierarchical bus network T = (V, E) uses hierarchically, treelike connected buses as a communication network. New communication technologies like SCI (Scalable Coherent Interface) (see, e.g., [6, 7]) make such networks very attractive, because they allow their easy construction and guarantee reas ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A hierarchical bus network T = (V, E) uses hierarchically, treelike connected buses as a communication network. New communication technologies like SCI (Scalable Coherent Interface) (see, e.g., [6, 7]) make such networks very attractive, because they allow their easy construction and guarantee reasonable communication performance. Such networks can be modeled as tree networks: leaves correspond to processors, inner nodes to buses, edges to switches, and bandwidths of inner nodes and edges are related to bandwidths of buses and switches, respectively. In this paper we address the problem of static data management. Given a set of shared data objects X and the read and write frequencies from the processors to the shared data objects, the goal is to compute a (maybe redundant) placement of the shared data objects to the processors, such that the congestion (the maximum over the load of all edges and inner nodes, induced by the read and write frequencies, divided by the bandwidth of the edge or inner node, respectively) is minimized. It is known [10] that this problem can be solved optimally in linear time, if inner nodes are allowed to hold copies of shared data objects. In our model, inner nodes correspond to buses and therefore cannot store copies of shared data objects. We show that this restriction increases the complexity of the placement problem drastically: It becomes NPhard. On the other hand, the main contribution of our paper is an approximation algorithm with runtime O(X·V ·height(T)· log(degree(T))) that increases the congestion by a factor of at most 7.
DFGSonderforschungsbereich 376 and the IST Programme of the EU under contract number
"... This paper deals with static data management in computer systems connected by networks. A basic functionality in these systems is the interactive use of shared data objects that can be accessed from each computer in the system. Examples for these objects are files in distributed file systems, cache ..."
Abstract
 Add to MetaCart
This paper deals with static data management in computer systems connected by networks. A basic functionality in these systems is the interactive use of shared data objects that can be accessed from each computer in the system. Examples for these objects are files in distributed file systems, cache lines in virtual shared memory systems, or pages in the WWW. In the static scenario we are given read and write request frequencies for each computerobject pair. The goal is to calculate a placement of the objects to the memory modules, possibly with redundancy, such that a given cost function is minimized. With the widespread use of commercial networks, as, e.g., the Internet, it is more and more important to consider commercial factors within data management strategies. The goal in previous work was to utilize the available resources, especially the bandwidth, as good as possible. We will present data management strategies for a model in which commercial cost instead of the communication cost is minimized, i.e., we are given a metric communication cost function and a storage cost function. We introduce new deterministic algorithms for the static data management problem on trees and arbitrary networks. Our algorithms aim to minimize the total cost. Note that this problem is MaxSNPhard on arbitrary