Results 1 - 10
of
33
Weak Ordering -- A New Definition
, 1990
"... A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater perfor ..."
Abstract
-
Cited by 213 (12 self)
- Add to MetaCart
A memory model for a shared memory, multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency. This model guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater performance potential. Weak ordering was first defined by Dubois, Scheurich and Briggs in terms of a set of rules for hardware that have to be made visible to software. The central hypothesis of this work is that programmers prefer to reason about sequentially consistent memory, rather than having to think about weaker memory, or even write buffers. Following this hypothesis, we re-define weak ordering as a contract between software and hardware. By this contract, software agrees to some formally specified constraints, and hardware agrees to appear sequentially consistent to at least the software that obeys those constraints. We illustrate the power of the new definition with a set of software constraints that forbid data races and an imple-mentation for cache-coherent systems chat is not allowed by the old definition.
Techniques for Debugging Parallel Programs with Flowback Analysis
, 1991
"... Flowback analysis is a powerful technique for debugging programs. It allows the programmer to examine dynamic dependences in a program's execution history without having to re-execute the program. The goal is to present to the programmer a graphical view of the dynamic program dependences. We are bu ..."
Abstract
-
Cited by 84 (8 self)
- Add to MetaCart
Flowback analysis is a powerful technique for debugging programs. It allows the programmer to examine dynamic dependences in a program's execution history without having to re-execute the program. The goal is to present to the programmer a graphical view of the dynamic program dependences. We are building a system, called PPD, that performs flowback analysis while keeping the execution time overhead low. We also extend the semantics of flowback analysis to parallel programs. This paper describes details of the graphs and algorithms needed to implement efficient flowback analysis for parallel programs. Execution time overhead is kept low by recording only a small amount of trace during a program's execution. We use semantic analysis and a technique called incremental tracing to keep the time and space overhead low. As part of the semantic analysis, PPD uses a static program dependence graph structure that reduces the amount of work done at compile time and takes advantage of the dynamic...
What are race conditions? some issues and formalizations
- LOPLAS
, 1992
"... In shared-memory parallel programs that use explicit synchronization, race conditions result when accesses to shared memory are not properly synchronized. Race conditions are often considered to be manifestations of bugs, since their presence can cause the program to behave unexpectedly, Unfortunate ..."
Abstract
-
Cited by 77 (0 self)
- Add to MetaCart
In shared-memory parallel programs that use explicit synchronization, race conditions result when accesses to shared memory are not properly synchronized. Race conditions are often considered to be manifestations of bugs, since their presence can cause the program to behave unexpectedly, Unfortunately, there has been little agreement in the literature as to precisely what constitutes a race condition. Two different notions have been implicitly considered: one pertaining to programs intended to be deterministic (which we call general races) and the other to nondeterministic programs containing critical sections (which we call data races). However, the differences between general races and data races have not yet been recognized. This paper examines these differences by characterizing races using a formal model and exploring their properties. We show that two variations of each type of race exist: feasible general races and data races capture the intuitive notions desired for debugging and apparent races capture less accurate notions implicitly assumed by most dynamic race detection methods. We also show that locating feasible races is an NP-hard problem, implying that only the apparent races, which are approximations to feasible races, can be detected in practice. The complexity of dynamically locating apparent races depends on the type of synchronization used by the program, Apparent
On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism
- In Proceedings of the 1991 Supercomputer Debugging Workshop
, 1991
"... Detecting data races in shared-memory parallel programs is an important debugging problem. This paper presents a new protocol for run-time detection of data races in executions of shared-memory programs with nested fork-join parallelism and no other interthread synchronization. This protocol has sig ..."
Abstract
-
Cited by 70 (3 self)
- Add to MetaCart
Detecting data races in shared-memory parallel programs is an important debugging problem. This paper presents a new protocol for run-time detection of data races in executions of shared-memory programs with nested fork-join parallelism and no other interthread synchronization. This protocol has significantly smaller worst-case run-time overhead than previous techniques. The worst-case space required by our protocol when monitoring an execution of a program P is O(V N), where V is the number of shared variables in P , and N is the maximum dynamic nesting of parallel constructs in P 's execution. The worst-case time required to perform any monitoring operation is O(N). We formally prove that our new protocol always reports a non-empty subset of the data races in a monitored program execution and describe how this property leads to an e#ective debugging strategy.
Improving the Accuracy of Data Race Detection
- In Proceedings of the 1991 Conference on the Principles and Practice of Parallel Programming
, 1991
"... For shared-memory parallel programs that use explicit synchronization, data race detection is an important part of debugging. A data race exists when concurrently executing sections of code access common shared variables. In programs intended to be data race free, they are sources of nondeterminism ..."
Abstract
-
Cited by 65 (6 self)
- Add to MetaCart
For shared-memory parallel programs that use explicit synchronization, data race detection is an important part of debugging. A data race exists when concurrently executing sections of code access common shared variables. In programs intended to be data race free, they are sources of nondeterminism usually considered bugs. Previous methods for detecting data races in executions of parallel programs can determine when races occurred, but can report many data races that are artifacts of others and not direct manifestations of program bugs. Artifacts exist because some races can cause others and can also make false races appear real. Such artifacts can overwhelm the programmer with information irrelevant for debugging. This paper presents results showing how to identify nonartifact data races by validation and ordering. Data race validation attempts to determine which races involve events that either did execute concurrently or could have (called feasible data races). We show how each de...
Memory Consistency Models for Shared-Memory Multiprocessors
- WRL RESEARCH REPORT
, 1995
"... The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the u ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the underlying hardware. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. However, fewer ordering guarantees can compromise programmability and portability. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. Furthermore, the lack of consensus on an acceptable model hinders software portability across different systems. This dissertation focuses on providing a balanced solution that directly addresses the trade-off between programming ease and performance. To address programmability, we propose an alternative method for specifying memory behavior that presents a higher level abstraction to the programmer. We show that with only a few types of information supplied by the
Designing Memory Consistency Models for Shared-Memory Multiprocessors
, 1993
"... The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for programmers, sequential consistency, restricts the use of many performance-enhancing optimizations ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
The memory consistency model (or memory model) of a shared-memory multiprocessor system influences both the performance and the programmability of the system. The simplest and most intuitive model for programmers, sequential consistency, restricts the use of many performance-enhancing optimizations exploited by uniprocessors. For higher performance, several alternative models have been proposed. However, many of these are hardware-centric in nature and difficult to program. Further, the multitude of many seemingly unrelated memory models inhibits portability. We use a 3P criteria of programmability, portability, and performance to assess memory models, and find current models lacking in one or more of these criteria. This thesis establishes a unifying framework for reasoning about memory models that leads to models that adequately satisfy the 3P criteria. The first contribution of this thesis is a programmer-centric methodology, called sequential consistency normal form (SCNF), for specifying memory models. This methodology is based on the observation that performance enhancing optimizations can be allowed without violating sequential consistency if the system is given some information about the program. An SCNF model is a contract between the system and the programmer, where the system guarantees both high performance and sequential consistency only if the programmer provides certain information about the program. Insufficient information gives lower performance, but incorrect information
Detecting Data Races on Weak Memory Systems
- IN PROCEEDINGS OF THE 18TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1991
"... For shared-memory systems, the most commonly assumed programmer's model of memory is sequential consistency. The weaker models of weak ordering, release consistency with sequentially consistent synchronization operations, data-race-free-0, and data-race-free-1 provide higher performance by guarantee ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
For shared-memory systems, the most commonly assumed programmer's model of memory is sequential consistency. The weaker models of weak ordering, release consistency with sequentially consistent synchronization operations, data-race-free-0, and data-race-free-1 provide higher performance by guaranteeing sequential consistency to only a restricted class of programs - mainly programs that do not exhibit data races. To allow programmers to use the intuition and algorithms already developed for sequentially consistent systems, it is important to determine when a program written for a weak system exhibits no data races. In this paper, we investigate the extension of dynamic data race detection techniques developed for sequentially consistent systems to weak systems. A potential problem is that in the presence of a data race, weak systems fail to guarantee sequential consistency and therefore dynamic techniques may not give meaningful results. However, we reason that in practice a weak system...
On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions
- In Proceedings of the 1990 International Conference on Parallel Processing
, 1990
"... This paper presents results on the complexity of computing event orderings for sharedmemory parallel program executions. Given a program execution, we formally define the problem of computing orderings that the execution must have exhibited or could have exhibited, and prove that computing such orde ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
This paper presents results on the complexity of computing event orderings for sharedmemory parallel program executions. Given a program execution, we formally define the problem of computing orderings that the execution must have exhibited or could have exhibited, and prove that computing such orderings is an intractable problem. We present a formal model of a shared-memory parallel program execution on a sequentially consistent processor, and discuss event orderings in terms of this model. Programs are considered that use fork/join and either counting semaphores or event style synchronization. We define a feasible program execution to be an execution of the program that performs the same events as an observed execution, but which may exhibit different orderings among those events. Any program execution exhibiting the same data dependences among the shared data as the observed execution is feasible. We define several relations that capture the orderings present in all (or some) of the...
Efficient on-the-fly data race detection in multithreaded C++ programs
- In PPoPP ’03: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
, 2003
"... Data race detection is essential for debugging multithreaded programs and assuring their correctness. Nevertheless, there is no single universal technique capable of handling the task efficiently, since the data race detection problem is computationally hard in the general case. Thus, to approximate ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
Data race detection is essential for debugging multithreaded programs and assuring their correctness. Nevertheless, there is no single universal technique capable of handling the task efficiently, since the data race detection problem is computationally hard in the general case. Thus, to approximate the possible races in a program, all currently available tools take different “short-cuts”, such as using strong assumptions on the program structure or applying various heuristics. When applied to some general case program, however, they usually result in excessive false alarms or in a large number of undetected races. Another major drawback of many currently available tools is that they are restricted, for perfor-mance reasons, to detection units of fixed size. Thus, they all suffer from the same problem—choosing a small unit might result in missing some of the data races, while choosing a large one might lead to false detection. In this work we present a novel testing tool, called MULTIRACE, which combines improved versions of DJIT and LOCKSET—two very powerful on-the-fly algorithms for dynamic detection of apparent data races. Both extended algorithms detect races in multithreaded programs that may execute on weak consistency systems, and may use two-way as well as global synchronization primitives. By employing novel technologies, MULTIRACE adjusts its detection to the native granularity of objects and variables in the program under examination. In order to monitor all accesses to each of the shared locations, MULTIRACE instruments the C++ source code of the program. It lets the user fine-tune the detection process, but otherwise is completely automatic and transparent. This paper describes the algorithms employed in MULTIRACE, as well as its implementation details. The paper also proposes some alternatives to and optimizations of MULTIRACE. It shows that the overheads imposed by MULTIRACE are often much smaller (orders of magnitude) than those obtained by other existing dynamic techniques.

