Results 1 -
3 of
3
Anatomy of a Resource Management System for HPC Clusters
- In Annual Review of Scalable Computing
, 2001
"... Workstation clusters are often not only used for high-throughput computing in time-sharing mode but also for running complex parallel jobs in space-sharing mode. This poses several difficulties to the resource management system, which must be able to reserve computing resources for exclusive use and ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Workstation clusters are often not only used for high-throughput computing in time-sharing mode but also for running complex parallel jobs in space-sharing mode. This poses several difficulties to the resource management system, which must be able to reserve computing resources for exclusive use and also to determine an optimal process mapping for a given system topology. On the basis of our CCS software, we describe the anatomy of a modern resource management system. Like Codine, Condor, and LSF, CCS provides mechanisms for the user-friendly system access and management of clusters. But unlike them, CCS is targeted at the effective support of space-sharing parallel and even metacomputers. Among other features, CCS provides a versatile resource description facility, topology-based process mapping, pluggable schedulers, and hooks to metacomputer management.
Lessons Learned While Operating Two Large SCI Clusters
, 2001
"... The availability of commodity high performance components for workstations and networks made it possible to build up large, PC based compute clusters at modest costs. These clusters seem to be a realistic alternative to proprietary, massively parallel systems with respect to the price/performance ra ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The availability of commodity high performance components for workstations and networks made it possible to build up large, PC based compute clusters at modest costs. These clusters seem to be a realistic alternative to proprietary, massively parallel systems with respect to the price/performance ratio. However, from the administration point of view those systems are still often solely a collection of autonomous nodes, connected by a fast short area network. Therefore, aiming on providing the best possible performance in daily work to all users a lot of work has to be done before obtaining the expected result.
Efficient Resource Management for Malleable Applications
, 2001
"... In this paper we present a method for managing concurrent parallel applications on large shared-memory machines efficiently and fair. It combines advantages of space-sharing for tight coupled parallel applications and the possibility of immediate job execution in time-sharing environments. An applic ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we present a method for managing concurrent parallel applications on large shared-memory machines efficiently and fair. It combines advantages of space-sharing for tight coupled parallel applications and the possibility of immediate job execution in time-sharing environments. An application parallelism manager...

