Results 1 - 10
of
22
Process migration
- ACM Computing Surveys
, 2000
"... A process is an operating system abstraction representing an instance of a running computer program. Process migration is the act of transferring a process between two machines during its execution. Several implementations ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
A process is an operating system abstraction representing an instance of a running computer program. Process migration is the act of transferring a process between two machines during its execution. Several implementations
REXEC: A Decentralized, Secure Remote Execution Environment for Parallel and Sequential Programs
- 4th Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing
, 2000
"... Bringing clusters of computers into the mainstream as general-purpose computing systems requires that better facilities for transparent remote execution of parallel and sequential applications be developed. While much research has been done in this area, most of this work remains inaccessible for cl ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Bringing clusters of computers into the mainstream as general-purpose computing systems requires that better facilities for transparent remote execution of parallel and sequential applications be developed. While much research has been done in this area, most of this work remains inaccessible for clusters built using contemporary hardware and operating systems. Implementations are either too old and/or not publicly available, require use of operating systems which are not supported by modern hardware, or simply do not meet the functional requirements demanded by practical use in real world settings. To address these issues, we designed REXEC, a decentralized, secure remote execution facility. It provides high availability, scalability, transparent remote execution, dynamic cluster configuration, decoupled node discovery and selection, a well-defined failure and cleanup model, parallel and distributed program support, and strong authentication and encryption. The system is implemented and is currently installed and in use on a 32-node cluster of 2-way SMPs running the Linux 2.2.5 operating system.
Design Issues of Process Migration Facilities in Distributed Systems
, 1990
"... Distributed systems are composed of several loosely-coupled computers communicating over a high-bandwidth network. To achieve an even distribution of the workload in a distributed system, either preemptive or non-preemptive load distribution strategies are used. Preemptive load distribution involves ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Distributed systems are composed of several loosely-coupled computers communicating over a high-bandwidth network. To achieve an even distribution of the workload in a distributed system, either preemptive or non-preemptive load distribution strategies are used. Preemptive load distribution involves process migration, while non-preemptive strategies are based on initial placement of processes on the machines. Process migration is a mechanism where a process on one machine is moved to another machine in a distributed system. This paper discusses the design of process migration facilities in distributed systems with respect to key issues, such as the system models on which the mechanisms are implemented, the hardware platforms they run on, the methods used in moving a process from one machine to another, the load distribution policies adopted, network transparency, etc. 1 Introduction Enhancements to computer hardware have made "distributed computing systems" more and more available to ...
Market-based Cluster Resource Management
, 2001
"... Resource management in high-performance, cluster computer systems is a challenging problem. Resources must be allocated amongst competing applications of varying levels of importance, and aggregate resource demand needs to be controlled to keep the system in a comfortable regime of operation. Effect ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Resource management in high-performance, cluster computer systems is a challenging problem. Resources must be allocated amongst competing applications of varying levels of importance, and aggregate resource demand needs to be controlled to keep the system in a comfortable regime of operation. Effectively performing these tasks requires knowledge of user valuations of the resources being allocated and having a feedback signal that causes users to back off the system when it is overloaded. Unfortunately, current approaches to cluster resource management provide little, if any, means for users to express resource valuations and to inuence their resource allocations. In addition, while feedback signals are provided, there are no associated incentives for users to pay attention to and respond to them. As a result, traditional systems are incapable of delivering the maximum possible value to users. The thesis of this work is that...
Distributed and Multiprocessor Scheduling
- ACM Computing Surveys
, 1996
"... This chapter discusses CPU scheduling in parallel and distributed systems. CPU scheduling is part of a broader class of resource allocation problems, and is probably the most carefully studied such problem. The main motivation for multiprocessor scheduling is the desire for increased speed in the ex ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This chapter discusses CPU scheduling in parallel and distributed systems. CPU scheduling is part of a broader class of resource allocation problems, and is probably the most carefully studied such problem. The main motivation for multiprocessor scheduling is the desire for increased speed in the execution of a workload. Parts of the workload, called tasks, can be spread across several
Parallel Raytracing: A Case Study on Partitioning and Scheduling on Workstation Clusters
- in Proc. Thirtieth International Conference on System Sciences, Hawaii
, 1997
"... In this paper, a case study is presented which is aimed at investigating the performance of several parallel versions of the POV--Ray raytracing package implemented on a workstation cluster using the MPI message passing library. Based on a manager/worker scheme, variants of workload partitioning and ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
In this paper, a case study is presented which is aimed at investigating the performance of several parallel versions of the POV--Ray raytracing package implemented on a workstation cluster using the MPI message passing library. Based on a manager/worker scheme, variants of workload partitioning and message scheduling strategies, in conjunction with different task granularities, are evaluated with respect to their runtime behaviour. The results indicate that dynamic, adaptive strategies are required to cope with both the unbalanced workload characteristics of the parallel raytracing application and the different computational capabilities of the machines in a workstation cluster environment. 1 Introduction Raytracing [9, 13, 24] is a widely used method for generating realistically looking images on a computer, and it is employed by many 3D modelling and animation systems for the final image rendering. The input to a raytracing algorithm is the scene -- the description of the geometry...
Implementation of Decentralized Load Sharing in Networked Workstations Using the Condor Package
- Journal of Parallel and Distributed Computing
, 1997
"... In recent years a number of load sharing (LS) mechanisms have been proposed or implemented to fully utilize system resources. We have designed and implemented a decentralized real-time LS mechanism based on the Condor package [17, 18]. Two important features of our design are use of region-change br ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In recent years a number of load sharing (LS) mechanisms have been proposed or implemented to fully utilize system resources. We have designed and implemented a decentralized real-time LS mechanism based on the Condor package [17, 18]. Two important features of our design are use of region-change broadcasts in the information policy to provide each workstation with timely state information at minimum communication cost, and use of preferred lists in the location policy to avoid task collisions. With these two features, we remove the central manager workstation in Condor, configure its functionalities into each participating workstation, transform Condor into a decentralized LS mechanism, and equip Condor with the capability to tolerate single workstation failures. Also discussed are the experiments on the proposed LS mechanism and the off-the-shelf Condor package and our observations of empirical data. Index Terms : distributed systems, adaptive load sharing, region-change broadcasts, ...

