Results 1 -
3 of
3
The MOL Project: An Open, Extensible Metacomputer
- In Heterogenous computing workshop HCW'97 at IPPS'97
, 1997
"... Distributed high-performance computing---so-called metacomputing---refers to the coordinated use of a pool of geographically distributed high-performance computers. The user's view of an ideal metacomputer is that of a powerful monolithic virtual machine. The implementor 's view, on the other hand, ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
Distributed high-performance computing---so-called metacomputing---refers to the coordinated use of a pool of geographically distributed high-performance computers. The user's view of an ideal metacomputer is that of a powerful monolithic virtual machine. The implementor 's view, on the other hand, is that of a variety of interacting services implemented in a scalable and extensible manner. In this paper, we present MOL, the Metacomputer Online environment. In contrast to other metcomputing environments, MOL is not based on specific programming models or tools. It has rather been designed as an open, extensible software system comprising a variety of software modules, each of them specialized in serving one specific task such as resource scheduling, job control, task communication, task migration, user interface, and much more. All of these modules exist and are working. The main challenge in the design of MOL lies in the specification of suitable, generic interfaces for the effective ...
Anatomy of a Resource Management System for HPC Clusters
- In Annual Review of Scalable Computing
, 2001
"... Workstation clusters are often not only used for high-throughput computing in time-sharing mode but also for running complex parallel jobs in space-sharing mode. This poses several difficulties to the resource management system, which must be able to reserve computing resources for exclusive use and ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Workstation clusters are often not only used for high-throughput computing in time-sharing mode but also for running complex parallel jobs in space-sharing mode. This poses several difficulties to the resource management system, which must be able to reserve computing resources for exclusive use and also to determine an optimal process mapping for a given system topology. On the basis of our CCS software, we describe the anatomy of a modern resource management system. Like Codine, Condor, and LSF, CCS provides mechanisms for the user-friendly system access and management of clusters. But unlike them, CCS is targeted at the effective support of space-sharing parallel and even metacomputers. Among other features, CCS provides a versatile resource description facility, topology-based process mapping, pluggable schedulers, and hooks to metacomputer management.
Resource Management for High-Performance PC Clusters
- Lecture Notes in Computer Science
, 1999
"... With the recent availability of cost-effective network cards for the PCI bus, researchers have been tempted to build up large compute clusters with standard PCs. Many of them are operated with workstation cluster management software in high-throughput or single user mode. For very large clusters wit ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
With the recent availability of cost-effective network cards for the PCI bus, researchers have been tempted to build up large compute clusters with standard PCs. Many of them are operated with workstation cluster management software in high-throughput or single user mode. For very large clusters with more than 100 PEs, however, it becomes necessary to implement a full fledged resource management software that allows to partition the system for multi-user access. In this paper, we present our Computing Center Software (CCS), which was originally designed for managing massively parallel high-performance computers, and now adapted to modern workstation clusters. It provides - partitioning of exclusive and non-exclusive resources, - hardware-independent scheduling of interactive and batch jobs, - open, extensible interfaces to other resource management systems, - a high degree of reliability.

