Results 1 - 10
of
185
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 320 (3 self)
- Add to MetaCart
(Show Context)
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
The design of the borealis stream processing engine
- In CIDR
, 2005
"... Borealis is a second-generation distributed stream processing engine that is being developed at Brandeis University, Brown University, and MIT. Borealis inherits core stream processing functionality from Aurora [14] and distribution functionality from Medusa [51]. Borealis modifies and extends both ..."
Abstract
-
Cited by 250 (10 self)
- Add to MetaCart
(Show Context)
Borealis is a second-generation distributed stream processing engine that is being developed at Brandeis University, Brown University, and MIT. Borealis inherits core stream processing functionality from Aurora [14] and distribution functionality from Medusa [51]. Borealis modifies and extends both systems in non-trivial and critical ways to provide advanced capabilities that are commonly required by newly-emerging stream processing applications. In this paper, we outline the basic design and functionality of Borealis. Through sample real-world applications, we motivate the need for dynamically revising query results and modifying query specifications. We then describe how Borealis addresses these challenges through an innovative set of features, including revision records, time travel, and control lines. Finally, we present a highly flexible and scalable QoS-based optimization model that operates across server and sensor networks and a new fault-tolerance model with flexible consistency-availability trade-offs.
Identifying Dynamic Replication Strategies for a High-Performance Data Grid
- In Proc. of the International Grid Computing Workshop
, 2001
"... . Dynamic replication can be used to reduce bandwidth consumption and access latency in high performance "data grids" where users require remote access to large files. Different replication strategies can be defined depending on when, where, and how replicas are created and destroyed. W ..."
Abstract
-
Cited by 157 (5 self)
- Add to MetaCart
. Dynamic replication can be used to reduce bandwidth consumption and access latency in high performance "data grids" where users require remote access to large files. Different replication strategies can be defined depending on when, where, and how replicas are created and destroyed. We describe a simulation framework that we have developed to enable comparative studies of alternative dynamic replication strategies. We present preliminary results obtained with this simulator, in which we evaluate the performance of five different replication strategies for three different kinds of access patterns. The data in this scenario is read-only and so there are no consistency issues involved. The simulation results show that significant savings in latency and bandwidth can be obtained if the access patterns contain a small degree of geographical locality. 1
Adaptive precision setting for cached approximate values
- In Proc. ACM SIGMOD
, 2001
"... Caching approximate values instead of exact values presents an opportunity for performance gains in exchange for decreased precision. To maximize the performance improvement, cached approximations must be of appropriate precision: approximations that are too precise easily become invalid, requiring ..."
Abstract
-
Cited by 131 (5 self)
- Add to MetaCart
Caching approximate values instead of exact values presents an opportunity for performance gains in exchange for decreased precision. To maximize the performance improvement, cached approximations must be of appropriate precision: approximations that are too precise easily become invalid, requiring frequent refreshing, while overly imprecise approximations are likely to be useless to applications, which must then bypass the cache. We present a parameterized algorithm for adjusting the precision of cached approximations adaptively to achieve the best performance as data values, precision requirements, or workload vary. We consider interval approximations to numeric values but our ideas can be extended to other kinds of data and approximations. Our algorithm strictly generalizes previous adaptive caching algorithms for exact copies: we can set parameters to require that all approximations be exact, in which case our algorithm dynamically chooses whether or not to cache each data value. We have implemented our algorithm and tested it on synthetic and real-world data. A number of experimental results are reported, showing the effectiveness of our algorithm at maximizing performance, and also showing that in the special case of exact caching our algorithm performs as well as previous algorithms. In cases where bounded imprecision is acceptable, our algorithm easily outperforms previous algorithms for exact caching. 1
Chain Replication for Supporting High Throughput and Availability
"... Chain replication is a new approach to coordinating clusters of fail-stop storage servers. The approach is intended for supporting large-scale storage services that exhibit high throughput and availability without sacrificing strong consistency guarantees. Besides outlining the chain replication pro ..."
Abstract
-
Cited by 113 (5 self)
- Add to MetaCart
Chain replication is a new approach to coordinating clusters of fail-stop storage servers. The approach is intended for supporting large-scale storage services that exhibit high throughput and availability without sacrificing strong consistency guarantees. Besides outlining the chain replication protocols themselves, simulation experiments explore the performance characteristics of a prototype implementation. Throughput, availability, and several objectplacement strategies (including schemes based on distributed hash table routing) are discussed.
Locating Objects in Mobile Computing
, 2001
"... In current distributed systems, the notion of mobility is emerging in many forms and applications. ..."
Abstract
-
Cited by 105 (7 self)
- Add to MetaCart
(Show Context)
In current distributed systems, the notion of mobility is emerging in many forms and applications.
Approximation Algorithms for Data Placement in Arbitrary Networks
, 2001
"... We study approximation algorithms for placing replicated data in arbitrary networks. Consider a network of nodes with individual storage capacities and a metric communication cost function, in which each node periodically issues a request for an object drawn from a collection of uniform-length objec ..."
Abstract
-
Cited by 84 (4 self)
- Add to MetaCart
(Show Context)
We study approximation algorithms for placing replicated data in arbitrary networks. Consider a network of nodes with individual storage capacities and a metric communication cost function, in which each node periodically issues a request for an object drawn from a collection of uniform-length objects. We consider the problem of placing copies of the objects among the nodes such that the average access cost is minimized. Our main result is a polynomial-time constant-factor approximation algorithm for this placement problem. Our algorithm is based on a careful rounding of a linear programming relaxation of the problem. We also show that the data placement problem is MAXSNP-hard. We extend our approximation result to a generalization of the data placement problem that models additional costs such as the cost of realizing the placement. We also show that when object lengths are non-uniform, a constant-factor approximation is achievable if the capacity at each node in the approximate solution is allowed to exceed that in the optimal solution by the length of the largest object.
Coordinated Placement and Replacement for Large-Scale Distributed Caches
- IEEE Transactions on Knowledge and Data Engineering
, 1998
"... In a large-scale information system such as a digital library or the web, a set of distributed caches can improve their effectiveness by coordinating their data placement decisions. In this paper, we examine the design space for cooperative placement and replacement algorithms. Our main focus is on ..."
Abstract
-
Cited by 84 (8 self)
- Add to MetaCart
In a large-scale information system such as a digital library or the web, a set of distributed caches can improve their effectiveness by coordinating their data placement decisions. In this paper, we examine the design space for cooperative placement and replacement algorithms. Our main focus is on the placement algorithms, which attempt to solve the following problem: given a set of caches, the network distances between caches, and predictions of the access rates from each cache to a set of objects, determine where to place each object in order to minimize the average access cost. Replacement algorithms also attempt to minimize access cost, but they work by selecting which objects to evict when a cache miss occurs. Using simulation, we examine three practical cooperative placement algorithms including one that is provably close to optimal, and we compare these algorithms to the optimal placement algorithm and several cooperative and non-cooperative replacement algorithms. We draw fiv...
RaDaR: A Scalable Architecture for a Global Web Hosting Service
, 1999
"... As commercial interest in the Internet grows, more and more companies are o#ering the service of hosting and providing access to information that belongs to third-party information providers. In the future, successful hosting services may host millions of objects on thousands of servers deployed a ..."
Abstract
-
Cited by 77 (2 self)
- Add to MetaCart
(Show Context)
As commercial interest in the Internet grows, more and more companies are o#ering the service of hosting and providing access to information that belongs to third-party information providers. In the future, successful hosting services may host millions of objects on thousands of servers deployed around the globe. To provide reasonable access performance to popular resources, these resources will have to be mirrored on multiple servers. In this paper, we identify some challenges due to the scale that a platform for such global services would face, and propose an architecture capable of handling this scale. The proposed architecture has no bottleneck points. A trace-driven simulation using an access trace from AT&T's hosting service shows very promising results for our approach. Keywords: Hosting service, scalable architecture, dynamic replication, migration. 1 Introduction As commercial interest in the Internet grows, more and more companies are o#ering hosting services i.e. ...
A Dynamic Object Replication and Migration Protocol for an Internet Hosting Service
- IN PROC. OF IEEE ICDCS
, 1998
"... This paper proposes a protocol suite for dynamic replication and migration of Internet objects. It consists of an algorithm for deciding on the number and location of object replicas and an algorithm for distributing requests among currently available replicas. Our approach attempts to place replica ..."
Abstract
-
Cited by 74 (7 self)
- Add to MetaCart
This paper proposes a protocol suite for dynamic replication and migration of Internet objects. It consists of an algorithm for deciding on the number and location of object replicas and an algorithm for distributing requests among currently available replicas. Our approach attempts to place replicas in the vicinity of a majority of requests while ensuring at the same time that no servers are overloaded. The request distribution algorithm uses the same simple mechanism to take into account both server proximity and load, without actually knowing the latter. The replica placement algorithm executes autonomously on each node, without the knowledge of other object replicas in the system. The proposed algorithms rely on the information available in databases maintained by Internet routers. A simulation study using synthetic workloads and the network backbone of UUNET, one of the largest Internet service providers, shows that the proposed protocol is effective in eliminating hot spots and ...