• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Effective Fine-Grained Synchronization for Automatically Parallelized Programs Using Optimistic Synchronization Primitives (1999)

by Martin Rinard
Venue:ACM Transactions on Computer Systems
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 23
Next 10 →

Pointer Analysis for Multithreaded Programs

by Radu Rugina, Martin Rinard - ACM SIGPLAN 99 , 1999
"... This paper presents a novel interprocedural, flow-sensitive, and context-sensitive pointer analysis algorithm for multithreaded programs that may concurrently update shared pointers. For each pointer and each program point, the algorithm computes a conservative approximation of the memory locations ..."
Abstract - Cited by 125 (13 self) - Add to MetaCart
This paper presents a novel interprocedural, flow-sensitive, and context-sensitive pointer analysis algorithm for multithreaded programs that may concurrently update shared pointers. For each pointer and each program point, the algorithm computes a conservative approximation of the memory locations to which that pointer may point. The algorithm correctly handles a full range of constructs in multithreaded programs, including recursive functions, function pointers, structures, arrays, nested structures and arrays, pointer arithmetic, casts between pointer variables of different types, heap and stack allocated memory, shared global variables, and thread-private global variables. We have implemented the algorithm in the SUIF compiler system and used the implementation to analyze a sizable set of multithreaded programs written in the Cilk multithreaded programming language. Our experimental results show that the analysis has good precision and converges quickly for our set of Cilk programs.

Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications

by Jose F. Martinez, Josep Torrellas - ASPLOS X , 2002
"... Barriers, locks, and flags are synchronizing operations widely used by programmers and parallelizing compilers to produce race-free parallel programs. Often times, these operations are placed suboptimally, either because of conservative assumptions about the program, or merely for code simplicity. W ..."
Abstract - Cited by 75 (7 self) - Add to MetaCart
Barriers, locks, and flags are synchronizing operations widely used by programmers and parallelizing compilers to produce race-free parallel programs. Often times, these operations are placed suboptimally, either because of conservative assumptions about the program, or merely for code simplicity. We propose

DCAS-Based Concurrent Deques

by Ole Agesen, David L. Detlefs, Christine H. Flood, Alexander T. Garthwaite, Paul A. Martin, Nir N. Shavit, Guy L. Steele, Jr. , 2000
"... The computer industry is currently examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting non-blocking synchronization on tomorrow's multiprocessor machines. However, before such a strong primitive will be incorporated into hardware des ..."
Abstract - Cited by 20 (6 self) - Add to MetaCart
The computer industry is currently examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting non-blocking synchronization on tomorrow's multiprocessor machines. However, before such a strong primitive will be incorporated into hardware design, its utility needs to be proven by developing a body of effective non-blocking data structures using DCAS. As part of this effort, we present two new linearizable non-blocking implementations of concurrent deques using the DCAS operation. The first uses an array representation, and improves on former algorithms by allowing uninterrupted concurrent access to both ends of the deque while correctly handling the difficult boundary cases when the deque is empty or full. The second uses a linked-list representation, and is the first non-blocking unbounded-memory deque implementation. It too allows uninterrupted concurrent access to both ends of the deque.

Using Early Phase Termination To Eliminate Load Imbalances At Barrier Synchronization Points

by Martin Rinard
"... We present a new technique, early phase termination, for eliminating idle processors in parallel computations that use barrier synchronization. This technique simply terminates each parallel phase as soon as there are too few remaining tasks to keep all of the processors busy. Although this techniqu ..."
Abstract - Cited by 9 (5 self) - Add to MetaCart
We present a new technique, early phase termination, for eliminating idle processors in parallel computations that use barrier synchronization. This technique simply terminates each parallel phase as soon as there are too few remaining tasks to keep all of the processors busy. Although this technique completely eliminates the idling that would otherwise occur at barrier synchronization points, it may also change the computation and therefore the result that the computation produces. We address this issue by providing probabilistic distortion models that characterize how the use of early phase termination distorts the result that the computation produces. Our experimental results show that for our set of benchmark applications, 1) early phase termination can improve the performance of the parallel computation, 2) the distortion is small (or can be made to be small with the use of an appropriate compensation technique) and 3) the distortion models provide accurate and tight distortion bounds. These bounds can enable users to evaluate the effect of early phase termination and confidently accept results from parallel computations that use this technique if they find the distortion bounds to be acceptable. Finally, we identify a general computational pattern that works well with early phase termination and explain why computations that exhibit this pattern can tolerate the early termination of parallel tasks without producing unacceptable results.

Eliminating Synchronization Bottlenecks in Object-Based Programs Using Adaptive Replication

by Martin Rinard , Pedro Diniz - IN PROCEEDINGS OF 1999 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS '99 , 1999
"... This paper presents a technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects. Synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object. It is often po ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
This paper presents a technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on objects. Synchronization bottlenecks occur when multiple threads attempt to concurrently update the same object. It is often possible to eliminate synchronization bottlenecks by replicating objects. Each thread can then update its own local replica without synchronization and without interacting with other threads. When the computation needs to access the original object, it combines the replicas to produce the correct values in the original object. One potential problem is that eagerly replicating all objects may lead to performance degradation and excessive memory consumption.

Even Better DCAS-Based Concurrent Deques

by David L. Detlefs, Christine H. Flood, Alexander T. Garthwaite, Er T. Garthwaite, Paul A. Martin, Nir N. Shavit, Jr., Guy L. Steele - In Proceedings of the 14th International Conference on Distributed Computing , 2000
"... The computer industry is examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting non-blocking synchronization on tomorrow's multiprocessor machines. However, before such a primitive will be incorporated into hardware design, its util ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
The computer industry is examining the use of strong synchronization operations such as double compare-and-swap (DCAS) as a means of supporting non-blocking synchronization on tomorrow's multiprocessor machines. However, before such a primitive will be incorporated into hardware design, its utility needs to be proven by developing a body of e#ective non-blocking data structures using DCAS.

Preemption-Based Avoidance of Priority Inversion for Java

by Adam Welc, Antony L. Hosking, Suresh Jagannathan - In Proceedings of the 2004 International Conference on Parallel Processing (ICPP , 2004
"... Priority inversion occurs in concurrent programs when low-priority threads hold shared resources needed by some high-priority thread, causing them to block indefinitely. Shared resources are usually guarded by low-level synchronization primitives such as mutual-exclusion locks, semaphores, or monito ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
Priority inversion occurs in concurrent programs when low-priority threads hold shared resources needed by some high-priority thread, causing them to block indefinitely. Shared resources are usually guarded by low-level synchronization primitives such as mutual-exclusion locks, semaphores, or monitors. There are two existing solutions to priority inversion. The first, establishing highlevel scheduling invariants over synchronization primitives to eliminate priority inversion a priori, is difficult in practice and undecidable in general. Alternatively, run-time avoidance mechanisms such as priority inheritance still force high-priority threads to wait until desired resources are released. We describe a novel compiler and run-time solution to the problem of priority inversion, along with experimental evaluation of its effectiveness. Our approach allows preemption of any thread holding a resource needed by higherpriority threads, forcing it to release its claim on the resource, roll back its execution to the point at which the shared resource was first acquired, and discard any updates made in the interim.

Allocating memory in a lock-free manner

by Anders Gidenstam, Marina Papatriantafilou, Philippas Tsigas , 2005
"... The potential of multiprocessor systems is often not fully realized by their system services. Certain synchronization methods, such as lock-based ones, may limit the parallelism. It is significant to see the impact of wait/lock-free synchronization design in key services for multiprocessor systems, ..."
Abstract - Cited by 7 (3 self) - Add to MetaCart
The potential of multiprocessor systems is often not fully realized by their system services. Certain synchronization methods, such as lock-based ones, may limit the parallelism. It is significant to see the impact of wait/lock-free synchronization design in key services for multiprocessor systems, such as the memory allocation service. Efficient, scalable memory allocators for multithreaded applications on multiprocessors is a significant goal of recent research projects. We propose a lock-free memory allocator, to enhance the parallelism in the system. Its architecture is inspired by Hoard, a successful concurrent memory allocator, with a modular, scalable design that preserves scalability and helps avoiding false-sharing and heap blowup. Within our effort on designing appropriate lock-free algorithms to construct this system, we propose a new non-blocking data structure called flat-sets, supporting conventional “internal” operations as well as “inter-object” operations, for moving items between flat-sets. We implemented the memory allocator in a set of multiprocessor systems (UMA Sun Enterprise 450 and ccNUMA Origin 3800) and studied its behaviour. The results show that the good properties of Hoard w.r.t. false-sharing and heap-blowup are preserved, while the scalability properties are enhanced even further with the help of lock-free synchronization.

Pointer Analysis for Structured Parallel Programs

by Radu Rugina, Martin C. Rinard , 2003
"... This paper presents a novel interprocedural, ow-sensitive, and context-sensitive pointer analysis algorithm for multithreaded programs that may concurrently update shared pointers. The algorithm is designed to handle programs with structured parallel constructs, including fork-join constructs, paral ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
This paper presents a novel interprocedural, ow-sensitive, and context-sensitive pointer analysis algorithm for multithreaded programs that may concurrently update shared pointers. The algorithm is designed to handle programs with structured parallel constructs, including fork-join constructs, parallel loops, and conditionally spawned threads. For each pointer and each program point, the algorithm computes a conservative approximation of the memory locations to which that pointer may point. The algorithm correctly handles a wide range of programming language constructs, including recursive functions, recursively generated parallelism, function pointers, structures, arrays, nested structures and arrays, pointer arithmetic, casts between dierent pointer types, heap and stack allocated memory, shared global variables, and thread-private global variables. We have implemented the algorithm in the SUIF compiler system and used the implementation to analyze a set of multithreaded programs written in the Cilk programming language. Our experimental results show that the analysis has good precision and converges quickly for our set of Cilk programs

Room Synchronizations

by Guy E. Blelloch, Perry Cheng, Phillip B. Gibbons , 2001
"... We present a class of synchronization called room synchronizations and show how this class can be used to implement asynchronous parallel queues and stacks with constant time access (assuming a fetch-and-add operation). The room synchronization problem involves supporting a set of m mutually exclusi ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
We present a class of synchronization called room synchronizations and show how this class can be used to implement asynchronous parallel queues and stacks with constant time access (assuming a fetch-and-add operation). The room synchronization problem involves supporting a set of m mutually exclusive "rooms" where any number of users can execute code simultaneously in any one of the rooms, but no two users can simultaneously execute code in separate rooms. Users asynchronously request permission to enter specified rooms, and neither the arrival time nor the arrival order nor the desired room of such requests are known ahead of time. We describe an algorithm for room synchronizations, and prove it satisfies a number of desirable properties. We have implemented our algorithm on a Sun UltraEnterprise 10000 multiprocessor. We present experimental results comparing an implementation of a parallel stack using room synchronizations to one using locks, demonstrating a significant scalability advantage for room synchronizations.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University