Results 1 - 10
of
17
Gang Scheduling with Memory Considerations
- in Proc. of the 14th Intl. Parallel and Distributed Processing Symp., 2000
"... A major problem with time slicing on parallel machines is memory pressure, as the resulting paging activity damages the synchronism among a job’s processes. An alternative is to impose admission controls, and only admit jobs that fit into the available memory. Despite suffering from delayed executio ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
A major problem with time slicing on parallel machines is memory pressure, as the resulting paging activity damages the synchronism among a job’s processes. An alternative is to impose admission controls, and only admit jobs that fit into the available memory. Despite suffering from delayed execution, this leads to better overall performance by preventing the harmful effects of paging and thrashing. 1.
Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturally-occurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...
Fine-Grain Distributed Shared Memory on Clusters of Workstations
, 1997
"... Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a fraction of the cost. In such environments, shared memory has been limited to page-based systems that control access to shared memory using the memory's page protection to implement shared memory coherence protocols. Unfortunately, false sharing and fragmentation problems force such systems to resort to weak consistency shared memory models that complicate the shared memory programming model.
Memory Usage in the LANL CM-5 Workload
- In Job Scheduling Strategies for Parallel Processing
, 1997
"... . It is generally agreed that memory requirements should be taken into account in the scheduling of parallel jobs. However, so far the work on combined processor and memory scheduling has not been based on detailed information and measurements. To rectify this problem, we present an analysis of ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
. It is generally agreed that memory requirements should be taken into account in the scheduling of parallel jobs. However, so far the work on combined processor and memory scheduling has not been based on detailed information and measurements. To rectify this problem, we present an analysis of memory usage by a production workload on a large parallel machine, the 1024-node CM-5 installed at Los Alamos National Lab. Our main observations are -- The distribution of memory requests has strong discrete components, i.e. some sizes are much more popular than others. -- Many jobs use a relatively small fraction of the memory available on each node, so there is some room for time slicing among several memory-resident jobs. -- Larger jobs (using more nodes) tend to use more memory, but it is difficult to characterize the scaling of per-processor memory usage. 1 Introduction Resource management includes a number of distinct topics, such as scheduling and memory management. Howeve...
Multiprocessor Scheduling for High-Variability Service Time Distributions
- SCHEDULING STRATEGIES FOR PARALLEL PROCESSING, LECTURE NOTES IN COMPUTER SCIENCE
, 1995
"... Many disciplines have been proposed for scheduling and processor allocation in multiprogrammed multiprocessors for parallel processing. These have been, for the most part, designed and evaluated for workloads having relatively low variability in service demand. But with reports that variability i ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Many disciplines have been proposed for scheduling and processor allocation in multiprogrammed multiprocessors for parallel processing. These have been, for the most part, designed and evaluated for workloads having relatively low variability in service demand. But with reports that variability in service demands at high performance computing centers can actually be quite high, these disciplines must be reevaluated. In this paper, we examine the performance of two well-known static scheduling disciplines, and propose preemptive versions of these that offer much better mean response times when the variability in service demand is high. We argue that, in systems in which dynamic repartitioning in applications is expensive or impossible, these preemptive disciplines are well suited for handling high variability in service demand.
A System Software Architecture for High-End Computing
- In Proceedings of SC'97
, 1997
"... Large MPP systems can neither solve grand-challenge scientific problems nor enable large scale industrial and governmental simulations if they rely on extensions to workstation system software. At Sandia National Laboratories we have developed, with our vendors, a new system architecture for highe ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
Large MPP systems can neither solve grand-challenge scientific problems nor enable large scale industrial and governmental simulations if they rely on extensions to workstation system software. At Sandia National Laboratories we have developed, with our vendors, a new system architecture for highend computing. Highest performance is achieved by providing applications with a light-weight interface to a collection of processing nodes. Usability is provided by creating node partitions specialized for user access, networking, and I/O. The entire system is glued together by a data movement interface which we call portals. Portals allow data to flow between processing nodes with minimal system overhead while maintaining a suitable degree of protection and reconfigurability. 1 Introduction The power of the last decade's supercomputers is now available in affordable desktop systems. However, the demand for ever increasing computational power has not abated. For example, the US Departm...
Parallel network ram: Effectively utilizing global cluster memory for large data-intensive parallel programs
- In 2004 International Conference on Parallel Processing (ICPP’2004
, 2004
"... Large scientific parallel applications demand large amounts of memory space. Current parallel computing platforms schedule jobs without fully knowing their memory requirements. This leads to uneven memory allocation in which some nodes are overloaded. This, in turn, leads to disk paging, which is ex ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Large scientific parallel applications demand large amounts of memory space. Current parallel computing platforms schedule jobs without fully knowing their memory requirements. This leads to uneven memory allocation in which some nodes are overloaded. This, in turn, leads to disk paging, which is extremely expensive in the context of scientific parallel computing. To solve this problem, we propose a new peer-to-peer solution called Parallel Network RAM. This approach avoids the use of disk and better utilizes available RAM resources. This approach will allow larger problems to be solved while reducing the computational, communication and synchronization overhead typically involved in parallel applications. 1.
Benefits of Speedup Knowledge in Memory-Constrained Multiprocessor Scheduling
- PERFORMANCE EVALUATION
, 1996
"... ..."
Adaptive scheduling under memory pressure on multiprogrammed clusters
- In Proc. of the 2nd IEEE/ACM International Conference on Cluster Computing and the Grid (ccGrid’02
, 2002
"... We present a simple scheduling strategy that copes with the adverse effects of paging on multiprogrammed SMPs. We consider open, multiuser SMP servers, typically found in academic or industrial environments. Our strategy incorporates four uniquely combined features. It is adaptive, in the sense that ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
We present a simple scheduling strategy that copes with the adverse effects of paging on multiprogrammed SMPs. We consider open, multiuser SMP servers, typically found in academic or industrial environments. Our strategy incorporates four uniquely combined features. It is adaptive, in the sense that the programs themselves take scheduling actions upon detecting memory pressure; it is dynamic, since programs detect the likelihood of paging at runtime by communicating with the operating system through a lightweight interface; it is preventive, because it takes scheduling actions before paging occurs; and it is non-intrusive, because the local scheduling actions taken by a program do not affect adversely, but act to the benefit of other programs sharing the system. We present an efficient implementation of our strategy in Linux and show with a realistic production workload that it can improve the response time of the Linux kernel under memory pressure by up to a factor of eight and the throughput by up to a factor of four. 1.
Adaptive Scheduling under Memory Constraints on Non-Dedicated Computational Farms
- Future Generation Computer Systems
, 2003
"... This paper presents scheduler extensions that enable better adaptation of parallel programs to the execution conditions of non-dedicated computational farms with limited memory resources. The purpose of the techniques is to prevent thrashing and co-schedule communicating threads, using two disjoint, ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents scheduler extensions that enable better adaptation of parallel programs to the execution conditions of non-dedicated computational farms with limited memory resources. The purpose of the techniques is to prevent thrashing and co-schedule communicating threads, using two disjoint, yet cooperating extensions to the kernel scheduler. A thrashing prevention module enables memory-bound programs to adapt to memory shortage, via suspending their threads at selected points of execution. Thread suspension is used so that memory is not over-committed by parallel jobs –which are assumed to be running as guests on the nodes of the computational farm – at memory allocation points. In the event of thrashing, parallel jobs are the first to release memory and help local resident jobs make progress. Adaptation is implemented using a shared-memory interface in the /proc filesystem and upcalls from the kernel to the user space. On an orthogonal axis, co-scheduling is implemented in the kernel with a heuristic that boosts periodically the priority of communicating threads. Using experiments on a cluster of workstations, we show that when a guest parallel job competes with general-purpose interactive, I/O-intensive, or CPU and memoryintensive load on the nodes of the cluster, thrashing prevention reduces drastically the slowdown of the job at memory utilization levels of 20 % or higher. The slowdown of parallel jobs is reduced by up to a factor of 7. Co-scheduling provides a limited performance improvement at memory utilization levels below 20%, but has no significant effect at higher memory utilization levels.

