Results 1 - 10
of
38
Foundations of the C++ Concurrency Memory Model
- PLDI'08
, 2008
"... Currently multi-threaded C or C++ programs combine a single-threaded programming language with a separate threads library. This is not entirely sound [7]. We describe an effort, currently nearing completion, to address these issues by explicitly providing semantics for threads in the next revision o ..."
Abstract
-
Cited by 61 (6 self)
- Add to MetaCart
Currently multi-threaded C or C++ programs combine a single-threaded programming language with a separate threads library. This is not entirely sound [7]. We describe an effort, currently nearing completion, to address these issues by explicitly providing semantics for threads in the next revision of the C++ standard. Our approach is similar to that recently followed by Java [25], in that, at least for a well-defined and interesting subset of the language, we give sequentially consistent semantics to programs that do not contain data races. Nonetheless, a number of our decisions are often surprising even to those familiar with the Java effort: • We (mostly) insist on sequential consistency for race-free programs, in spite of implementation issues that came to light after the Java work. • We give no semantics to programs with data races. There are no benign C++ data races. • We use weaker semantics for trylock than existing languages or libraries, allowing us to promise sequential consistency with an intuitive race definition, even for programs with trylock. This paper describes the simple model we would like to be able to provide for C++ threads programmers, and explain how this, together with some practical, but often under-appreciated implementation constraints, drives us towards the above decisions.
The ATOMOS Transactional Programming Language
, 2006
"... Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional waiting constructs with simpler transactional alternatives. The Atomos watch statement al ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
Atomos is the first programming language with implicit transactions, strong atomicity, and a scalable multiprocessor implementation. Atomos is derived from Java, but replaces its synchronization and conditional waiting constructs with simpler transactional alternatives. The Atomos watch statement allows programmers to specify fine-grained watch sets used with the Atomos retry conditional waiting statement for efficient transactional conflict-driven wakeup even in transactional memory systems with a limited number of transactional contexts. Atomos supports open-nested transactions, which are necessary for building both scalable application programs and virtual machine implementations.
Oracle semantics for concurrent separation logic
- In Proc. European Symp. on Programming (ESOP 2008
, 2008
"... Abstract. We define (with machine-checked proofs in Coq) a modular operational semantics for Concurrent C minor—a language with shared memory, spawnable threads, and first-class locks. By modular we mean that one can reason about sequential control and data-flow knowing almost nothing about concurre ..."
Abstract
-
Cited by 45 (10 self)
- Add to MetaCart
Abstract. We define (with machine-checked proofs in Coq) a modular operational semantics for Concurrent C minor—a language with shared memory, spawnable threads, and first-class locks. By modular we mean that one can reason about sequential control and data-flow knowing almost nothing about concurrency, and one can reason about concurrency knowing almost nothing about sequential control and data-flow constructs. We present a Concurrent Separation Logic with first-class locks and threads, and prove its soundness with respect to the operational semantics. Using our modularity principle, we proved the sequential C.S.L. rules (those inherited from sequential Separation Logic) simply by adapting Appel & Blazy’s machine-checked soundness proofs. Our Concurrent C minor operational semantics is designed to connect to Leroy’s optimizing (sequential) C minor compiler; we propose our modular semantics as a way to adapt Leroy’s compiler-correctness proofs to the concurrent setting. Thus we will obtain end-to-end proofs: the properties you prove in Concurrent Separation Logic will be true of the program that actually executes on the machine. 1
Enforcing isolation and ordering in STM
- In the Proceedings of the Conf. on Programming Language Design and Implementation
, 2007
"... Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can resu ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can result in unexpected, implementation-dependent behavior. To guarantee isolation and consistent ordering in such a system, programmers are expected to enclose all shared-memory accesses inside transactions. A system that provides strong atomicity guarantees isolation even in the presence of threads that access shared data outside transactions. A strongly-atomic system also orders transactions with conflicting non-transactional memory operations in a consistent manner. In this paper, we discuss some surprising pitfalls of weak atomicity, and we present an STM system that avoids these problems
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution
"... The behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in the execution of multithreaded programs. This severely complicates many tasks, including debugging, testing, and automatic ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
The behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in the execution of multithreaded programs. This severely complicates many tasks, including debugging, testing, and automatic replication. In this work, we avoid these complications by eliminating their root cause: we develop a compiler and runtime system that runs arbitrary multithreaded C/C++ POSIX Threads programs deterministically. A trivial non-performant approach to providing determinism is simply deterministically serializing execution. Instead, we present a compiler and runtime infrastructure that ensures determinism but resorts to serialization rarely, for handling interthread communication and synchronization. We develop two basic approaches, both of which are largely dynamic with performance improved by some static compiler optimizations. First, an ownership-based approach detects interthread communication via an evolving table that tracks ownership of memory regions by threads. Second, a buffering approach uses versioned memory and employs a deterministic commit protocol to make changes visible to other threads. While buffering has larger single-threaded overhead than ownership, it tends to scale better (serializing less often). A hybrid system sometimes performs and scales better than either approach individually. Our implementation is based on the LLVM compiler infrastructure. It needs neither programmer annotations nor special hardware. Our empirical evaluation uses the PARSEC and SPLASH2 benchmarks and shows that our approach scales comparably to nondeterministic execution.
WYSINWYX: What You See Is Not What You eXecute
, 2009
"... Over the last seven years, we have developed static-analysis methods to recover a good approximation to the variables and dynamically-allocated memory objects of a stripped executable, and to track the flow of values through them. The paper presents the algorithms that we developed, explains how the ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
Over the last seven years, we have developed static-analysis methods to recover a good approximation to the variables and dynamically-allocated memory objects of a stripped executable, and to track the flow of values through them. The paper presents the algorithms that we developed, explains how they are used to recover intermediate representations (IRs) from executables that are similar to the IRs that would be available if one started from source code, and describes their application in the context of program understanding and automated bug hunting. Unlike algorithms for analyzing executables that existed prior to our work, the ones presented in this paper provide useful information about memory accesses, even in the absence of debugging information. The ideas described in the paper are incorporated in a tool for analyzing Intel x86 executables, called CodeSurfer/x86. CodeSurfer/x86 builds a system dependence graph for the program, and provides a GUI for exploring the graph by (i) navigating its edges, and (ii) invoking operations, such as forward slicing, backward slicing, and chopping, to discover how parts of the program can impact other parts. To assess the usefulness of the IRs recovered by CodeSurfer/x86 in the context of automated bug hunting, we built a tool on top of CodeSurfer/x86, called Device-Driver Analyzer for x86
A formally verified compiler backend
, 2008
"... This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Su ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well. Categories and Subject Descriptors: F.3.1 [Logics and meanings of programs]: Specifying and verifying and reasoning about programs—Mechanical verification; D.2.4 [Software engineering]: Software/program verification—Correctness proofs, formal methods, reliability; D.3.4 [Programming languages]: Processors—Compilers, optimization
Concurrency among strangers: Programming in E as plan coordination
- In Trustworthy Global Computing, International Symposium, TGC 2005
, 2005
"... Abstract. Programmers write programs, expressing plans for machines to execute. When composed so that they may cooperate, plans may instead interfere with each other in unanticipated ways. Plan coordination is the art of simultaneously enabling plans to cooperate, while avoiding hazards of destructi ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Abstract. Programmers write programs, expressing plans for machines to execute. When composed so that they may cooperate, plans may instead interfere with each other in unanticipated ways. Plan coordination is the art of simultaneously enabling plans to cooperate, while avoiding hazards of destructive plan interference. For sequential computation within a single machine, object programming supports plan coordination well. For concurrent computation, this paper shows how hard it is to use locking to prevent plans from interfering without also destroying their ability to cooperate. In Internet-scale computing, machines proceed concurrently, interact across barriers of large latencies and partial failure, and encounter each other’s misbehavior. Each dimension presents new plan coordination challenges. This paper explains how the E language addresses these joint challenges by changing only a few concepts of conventional sequential object programming. Several projects are adapting these insights to existing platforms. 1
Abstract Capabilities and Limitations of Library-Based Software Transactional Memory in C++ ∗
"... Like many past extensions to user programming models, transactions can be added to the programming language or implemented in a library using existing language features. We describe a library-based transactional memory API for C++. Designed to address the limitations of an earlier API with similar f ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Like many past extensions to user programming models, transactions can be added to the programming language or implemented in a library using existing language features. We describe a library-based transactional memory API for C++. Designed to address the limitations of an earlier API with similar functionality, the new interface leverages macros, exceptions, multiple inheritance, generics (templates), and overloading of operators (including pointer dereference) in an attempt to minimize syntactic clutter, admit a wide variety of back-end implementations, avoid arbitrary restrictions on otherwise valid language constructs, enable privatization, catch as many programmer errors as possible, and provide semantics that “seem natural ” to C++ programmers. Having used our API to construct several small and one large application, we conclude that while the interface is a significant improvement on earlier efforts, and makes it practical for systems researchers to build nontrivial applications, it fails to realize the programming simplicity that was supposed to be the motivation for transactions in the first place. Several groups have proposed compiler support as a way to improve the performance of transactions. We conjecture that compiler—and language—support will be even more important as a way to improve the programming model. 1.
Reordering Constraints for Pthread-Style Locks
, 2005
"... threads, locks, memory barriers, memory fences, code reordering, data race, pthreads, optimization C or C++ programs relying on the pthreads interface for concurrency are required to use a specified set of functions to avoid data races, and to ensure memory visibility across threads. Although the de ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
threads, locks, memory barriers, memory fences, code reordering, data race, pthreads, optimization C or C++ programs relying on the pthreads interface for concurrency are required to use a specified set of functions to avoid data races, and to ensure memory visibility across threads. Although the detailed rules are not completely clear[10], it is not hard to refine them to a simple set of clear and uncontroversial rules for at least a subset of the C language that excludes structures (and hence bit-fields). We precisely address the question of how locks in this subset must be implemented, and particularly when other memory operations can be reordered with respect to locks. This impacts the memory fences required in lock implementations, and hence has significant performance impact. Along the way, we show that a significant class of common compiler transformations are actually safe in the presence of pthreads, something which appears to have received minimal attention in the past. We show that, surprisingly to us, the reordering constraints are not symmetric for the lock and unlock operations. In particular, it is not always safe to move memory operations into a locked region by delaying them past a pthread mutex lock() call, but it is safe to move them into such a region by advancing them to before a pthread mutex unlock() call. We believe that this was not previously recognized, and there is evidence that it is under appreciated among implementors of thread libraries. Although our precise arguments are expressed in terms of statement reordering within a small subset language, we believe that our results capture the situation for a full C/C++ implementation. We also argue that our results are insensitive to the details of our literal (and reasonable, though possibly unintended) interpretation of the pthread standard. We believe that they accurately reflect hardware memory ordering constraints in addition to compiler constraints. And they appear to have implications beyond pthread environments.

