Results 1 -
4 of
4
A Framework for Reliable and Efficient Data Placement in Distributed Computing Systems
- Journal of Parallel and Distributed Computing
, 2005
"... Data placement is an essential part of today’s distributed applications since moving the data close to the application has many benefits. The increasing data requirements of both scientific and commercial applications, and collaborative access to these data make it even more important. In the curren ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Data placement is an essential part of today’s distributed applications since moving the data close to the application has many benefits. The increasing data requirements of both scientific and commercial applications, and collaborative access to these data make it even more important. In the current approach, data placement is regarded as a side affect of computation. Our goal is to make data placement a first class citizen in distributed computing systems just like the computational jobs. They will be queued, scheduled, monitored, managed, and even checkpointed. Since data placement jobs have different characteristics than computational jobs, they cannot be treated in the exact same way as computational jobs. For this purpose, we are proposing a framework which can be considered as a “data placement subsystem ” for distributed computing systems, similar to the I/O subsystem in operating systems. This framework includes a specialized scheduler for data placement, a high level planner aware of data placement jobs, a resource broker/policy enforcer and some optimization tools. Our system can perform reliable and efficient data placement, it can recover from all kinds of failures without any human intervention, and it can dynamically adapt to the environment at the execution time. Key words. Distributed computing, reliable and efficient data placement, scheduling, run-time adaptation, protocol auto-tuning, data intensive applications, I/O subsystem. 1.
An Overview of Data Replication on the Internet
- In Proc. of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN
, 2002
"... The proliferation of the Internet is leading to high expectation on the fast turnaround time. Clients abandoning their connections due to excessive downloading delays translates directly to profit losses. Hence, minimizing the latency perceived by end-users has become the primary performance objecti ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
The proliferation of the Internet is leading to high expectation on the fast turnaround time. Clients abandoning their connections due to excessive downloading delays translates directly to profit losses. Hence, minimizing the latency perceived by end-users has become the primary performance objective compared to more traditional issues, such as server utilization. The two promising techniques to improve the Internet responsiveness are caching and replication. In this paper we present an overview of recent research in replication. We begin by arguing on the important role of replication in decreasing client perceived response time and proceed by illustrating the main topics that affect its successful deployment on the Internet. We analyze and characterize existing research, providing taxonomies and classifications whenever possible. Our discussion reveals several open problems and research directions. 1
Data Placement in Widely Distributed Systems
, 2005
"... The unbounded increase in the computation and data requirements of scientific applica-tions has necessitated the use of widely distributed compute and storage resources to meet the demand. In such an environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Ef ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The unbounded increase in the computation and data requirements of scientific applica-tions has necessitated the use of widely distributed compute and storage resources to meet the demand. In such an environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Efficient and reliable access to data sources and archiving destinations in a widely distributed environment brings new challenges. Placing data on temporary local storage devices offers many advantages, but such “data placements ” also require careful management of storage resources and data movement, i.e. allocating storage space, staging-in of input data, staging-out of generated data, and de-allocation of local storage after the data is safely stored at the destination. Existing systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the com-putation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. In this dissertation, we propose a framework that de-couples computation and data placement, allows asynchronous execution of each, and treats data
Run-time Adaptation of Grid Data Placement Jobs
- In Proceedings of Int. Workshop on Adaptive Grid Middleware
, 2003
"... Grid presents a continuously changing environment. It also introduces a new set of failures. The data grid initiative has made it possible to run data-intensive applications on the grid. Data-intensive grid applications consist of two parts: a data placement part and a computation part. The data pla ..."
Abstract
- Add to MetaCart
Grid presents a continuously changing environment. It also introduces a new set of failures. The data grid initiative has made it possible to run data-intensive applications on the grid. Data-intensive grid applications consist of two parts: a data placement part and a computation part. The data placement part is responsible for transferring the input data to the compute node and the result of the computation to the appropriate storage system. While work has been done on making computation adapt to changing conditions, little work has been done on making the data placement adapt to changing conditions.

