Results 1 -
6 of
6
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempt to achieve the minimum completion time by distributing the workload as evenly ..."
Abstract
-
Cited by 133 (2 self)
- Add to MetaCart
Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempt to achieve the minimum completion time by distributing the workload as evenly as possible, while minimizing the number of synchronization operations required. In this paper we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to non-local data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to ot...
Power Aware Page Allocation
- In Architectural Support for Programming Languages and Operating Systems
, 2000
"... One of the major challenges of post-PC computing is the need to reduce energy consumption, thereby extending the lifetime of the batteries that p ower these mobile devices. Memory is a particularly important tar get for e orts to improve energy e ciency. Memory technolo gy is becoming available that ..."
Abstract
-
Cited by 121 (9 self)
- Add to MetaCart
One of the major challenges of post-PC computing is the need to reduce energy consumption, thereby extending the lifetime of the batteries that p ower these mobile devices. Memory is a particularly important tar get for e orts to improve energy e ciency. Memory technolo gy is becoming available that o ers power management featur es such as the ability to put individual chips in any one of several di erent power modes. In this paper we explor e the interaction of page plac ement with static and dynamic hardware policies to exploit these emer ginghardwar efeatur es. In p articular, we c onsider p age allo cation p olicies that ancbe employed by an informed operating system to complement the hardware power management strategies. We perform experiments using two complementary simulation envir onments: a tracedriven simulator with workload traces that are representative of mobile computing and an execution-driven simulator with a detaile d processor/memory model and a more memoryintensive set of benchmarks (SPEC2000). Our r esults make a compelling case for a cooperative hardwar e/software approach for exploiting power-aware memory, with down to as little as 45 % of the Energy Delay for the best static policy and 1 % to 20 % of the Ener gyDelay for a traditional fullpower memory. 1.
Evaluation of Multiprocessor Memory Systems Using Off-Line Optimal Behavior
, 1991
"... Execution: A Technique for Efficiently Tracing Programs. Soft- ware: Practice and Experience, 20(12):1241-1258, December 1990. ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Execution: A Technique for Efficiently Tracing Programs. Soft- ware: Practice and Experience, 20(12):1241-1258, December 1990.
Locality-Based Scheduling in Shared-Memory Multiprocessors
- Parallel Computing: Paradigms and Applications
, 1993
"... The last decade has produced enormous improvements in microprocessor performance without a corresponding improvement in memory or interconnection network performance. As a result, the relative cost of communication in shared-memory multiprocessors has increased dramatically. Although many applicatio ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The last decade has produced enormous improvements in microprocessor performance without a corresponding improvement in memory or interconnection network performance. As a result, the relative cost of communication in shared-memory multiprocessors has increased dramatically. Although many applications could ignore the cost of communication and still achieve good per- formance on the previous generations of shared-memory machines, good performance on modern machines requires that communication be reduced or eliminated. One way to reduce the need for communication is to use scheduling polices that exploit knowledge of the location of data when assigning processes to processors, improving locality of reference by co-locating a process with the data it will require. This chapter presents an overview of the tradeoffs to be made in process scheduling, and evaluates locality-based scheduling techniques at the level of the operating system kernel, thread package, and parallelizing compiler.
Trace-Driven Simulation of Data-Alignment and other Factors affecting Update and Invalidate Based Coherent Memory
, 1994
"... The exploitation of locality of reference in shared memory multiprocessors is one of the most important problems in parallel processing today. Locality can be managed in several levels: hardware, operating system, runtime environment of the compiler, user level. In this paper we investigate the prob ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
The exploitation of locality of reference in shared memory multiprocessors is one of the most important problems in parallel processing today. Locality can be managed in several levels: hardware, operating system, runtime environment of the compiler, user level. In this paper we investigate the problem of exploiting locality at the operating system level and its interactions with the compiler and the architecture. Our main conclusion, based on trace-driven simulations of real applications, is that exploitation of locality is effective only if all three levels cooperate.
AS-COMA: An Adaptive Hybrid Shared Memory Architecture
- In Proceedings of the 1998 International Conference on Parallel Processing
, 1998
"... Scalable shared memory multiprocessors traditionally use either a cache coherent nonuniform memory access (CC-NUMA) or simple cache-only memory architecture (SCOMA) memory architecture. Recently,hybrid architectures that combine aspects of both CC-NUMA and S-COMA have emerged. In this paper, we p ..."
Abstract
- Add to MetaCart
Scalable shared memory multiprocessors traditionally use either a cache coherent nonuniform memory access (CC-NUMA) or simple cache-only memory architecture (SCOMA) memory architecture. Recently,hybrid architectures that combine aspects of both CC-NUMA and S-COMA have emerged. In this paper, we presenttwoimprovements over other hybrid architectures. The first improvement is a page allocation algorithm that prefers S-COMA pages at low memory pressures. Once the local free page pool is drained, additional pages are mapped in CC-NUMA mode until they suffer sufficient remote misses to warrant upgrading to S-COMA mode. The second improvement is a page replacement algorithm that dynamically backs off the rate of page remappings from CC-NUMA to SCOMA mode at high memory pressure. This design dramatically reduces the amountof kernel overhead and the number of induced cold misses caused by needless thrashing of the page cache. The resulting hybrid architecture is called adaptive S-COMA (AS-COMA).

