• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback (1999)

by P C Diniz, M C Rinard
Venue:ACM Trans. Comput. Syst
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

Effective Fine-Grain Synchronization For Automatically Parallelized Programs Using Optimistic Synchronization Primitives

by Martin C. Rinard - ACM TRANSACTIONS ON COMPUTER SYSTEMS , 1999
"... This paper presents our experience using optimistic synchronization to implement fine-grain atomic operations in the context of a parallelizing compiler for irregular, object-based computations. Our experience shows that the synchronization requirements of these programs differ significantly from th ..."
Abstract - Cited by 33 (5 self) - Add to MetaCart
This paper presents our experience using optimistic synchronization to implement fine-grain atomic operations in the context of a parallelizing compiler for irregular, object-based computations. Our experience shows that the synchronization requirements of these programs differ significantly from those of traditional parallel computations, which use loop nests to access dense matrices using affine access functions. In addition to coarsegrain barrier synchronization, our irregular computations require synchronization primitives that support efficient fine-grain atomic operations

A Case for User-Level Dynamic Page Migration

by Dimitrios S. Nikolopoulos , Theodore S. Papatheodorou, Constantine D. Polychronopoulos, Jesus Labarta, Eduard Ayguadé , 2000
"... This paper presents user-level dynamic page migration, a runtime technique which transparently enables parallel pro-grams to tune their memory performance on distributed shared memory multiprocessors, with feedback obtained from dynamic monitoring of memory activity. Our technique exploits the itera ..."
Abstract - Cited by 11 (8 self) - Add to MetaCart
This paper presents user-level dynamic page migration, a runtime technique which transparently enables parallel pro-grams to tune their memory performance on distributed shared memory multiprocessors, with feedback obtained from dynamic monitoring of memory activity. Our technique exploits the iterative nature of parallel programs and information available to the program both at compile time and at runtime in order to improve the accuracy and the timeliness of page migrations, as well as amortize better the overhead, compared to page migration engines implemented in the operating system. We present an adaptive page migration algorithm based on a competitive and a predictive criterion. The competitive criterion is used to correct poor page placement decisions of the operating system, while the predictive criterion makes the algorithm respon-sive to scheduling events that necessitate immediate page migrations, such as preemptions and migrations of threads. We also present a new technique for preventing page ping-pong and a mechanism for monitoring the performance of page migration algorithms at runtime and tuning their sen-sitive parameters accordingly. Our experimental evidence on a SGI Origin2000 shows that unmodified OpenMP codes linked with our runtime system for dynamic page migration are effectively immune to the page placement strategy of the operating system and the associated problems with data locality. Furthermore, our runtime system achieves solid performance improvements compared to the IRIX 6.5.5 page migration engine, for single parallel OpenMP codes and multiprogrammed workloads.

The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

by Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou - International Journal of Parallel Programming Volume , 2001
"... This paper investigates the performance of synchronization algorithms on ccNUMA multiprocessors, from the perspectives of the architecture and the operating system. In contrast with previous related studies that emphasized the relative performance of synchronization algorithms, this paper takes a ne ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
This paper investigates the performance of synchronization algorithms on ccNUMA multiprocessors, from the perspectives of the architecture and the operating system. In contrast with previous related studies that emphasized the relative performance of synchronization algorithms, this paper takes a new approach by analyzing the sources of synchronization latency on ccNUMA architectures and how can this latency be reduced by leveraging hardware and software schemes in both dedicated and multiprogrammed execution environments. From the architectural perspective, the paper identifies the implications of directory-based cache coherence on the latency and scalability of synchronization primitives and examines if and how can simple hardware that accelerates synchronization instructions be leveraged to reduce synchronization latency. From the operating system’s perspective, the paper evaluates in a unified framework, user-level, kernel-level and hybrid algorithms for implementing scalable synchronization in multiprogrammed execution environments. Along with visiting the aforementioned issues, the paper contributes a new methodology for implementing fast synchronization algorithms on ccNUMA multiprocessors. The relevant experiments are conducted on the SGI Origin2000, a popular commercial ccNUMA multiprocessor.

A Comparison of Concurrent Programming and Cooperative Multithreading

by Aaron W. Keen, Takashi Ishihara, Justin T. Maris, Tiejun Li, Eugene F. Fodor, Ronald A. Olsson , 2001
"... this paper to Olsson ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
this paper to Olsson

Smartlocks: Lock Acquisition Scheduling for Self-Aware Synchronization

by Jonathan Eastep, David Wingate, Marco D. Santambrogio, Anant Agarwal - in ICAC 2010 Proceedings , 2010
"... As multicore processors become increasingly prevalent, system complexity is skyrocketing. The advent of the asymmetric multicore compounds this – it is no longer practical for an average programmer to balance the system constraints associated with today’s multicores and worry about new problems like ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
As multicore processors become increasingly prevalent, system complexity is skyrocketing. The advent of the asymmetric multicore compounds this – it is no longer practical for an average programmer to balance the system constraints associated with today’s multicores and worry about new problems like asymmetric partitioning and thread interference. Adaptive, or self-aware, computing has been proposed as one method to help application and system programmers confront this complexity. These systems take some of the burden off of programmers by monitoring themselves and optimizing or adapting to meet their goals. This paper introduces a self-aware synchronization library for multicores and asymmetric multicores called Smartlocks. Smartlocks is a spin-lock library that adapts its internal

Eliminating Synchronization Bottlenecks Using Adaptive Replication

by Martin C. Rinard, Pedro C. Diniz , 2003
"... This article presents a new technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects. Synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object. It is of ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
This article presents a new technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects. Synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object. It is often possible to eliminate synchronization bottlenecks by replicating objects. Each thread can then update its own local replica without synchronization and without interacting with other threads. When the computation needs to access the original object, it combines the replicas to produce the correct values in the original object. One potential problem is that eagerly replicating all objects may lead to performance degradation and excessive memory consumption. Adaptive
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University