Results 1 - 10
of
16
Foundations of the C++ Concurrency Memory Model
- PLDI'08
, 2008
"... Currently multi-threaded C or C++ programs combine a single-threaded programming language with a separate threads library. This is not entirely sound [7]. We describe an effort, currently nearing completion, to address these issues by explicitly providing semantics for threads in the next revision o ..."
Abstract
-
Cited by 61 (6 self)
- Add to MetaCart
Currently multi-threaded C or C++ programs combine a single-threaded programming language with a separate threads library. This is not entirely sound [7]. We describe an effort, currently nearing completion, to address these issues by explicitly providing semantics for threads in the next revision of the C++ standard. Our approach is similar to that recently followed by Java [25], in that, at least for a well-defined and interesting subset of the language, we give sequentially consistent semantics to programs that do not contain data races. Nonetheless, a number of our decisions are often surprising even to those familiar with the Java effort: • We (mostly) insist on sequential consistency for race-free programs, in spite of implementation issues that came to light after the Java work. • We give no semantics to programs with data races. There are no benign C++ data races. • We use weaker semantics for trylock than existing languages or libraries, allowing us to promise sequential consistency with an intuitive race definition, even for programs with trylock. This paper describes the simple model we would like to be able to provide for C++ threads programmers, and explain how this, together with some practical, but often under-appreciated implementation constraints, drives us towards the above decisions.
Threads cannot be implemented as a library
- In PLDI
, 2005
"... threads, library, register promotion, compiler optimization, garbage collection In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a ge ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
threads, library, register promotion, compiler optimization, garbage collection In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code. We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure library-based approach seems incapable of expressing an efficient parallel algorithm. Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.
Rigorous specification and conformance testing techniques for network protocols, as applied to TCP, UDP, and Sockets
- In Proceedings of ACM Conference on Computer Communication (SIGCOMM 2005
, 2005
"... Network protocols are hard to implement correctly. Despite the existence of RFCs and other standards, implementations often have subtle differences and bugs. One reason for this is that the specifications are typically informal, and hence inevitably contain ambiguities. Conformance testing against s ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
Network protocols are hard to implement correctly. Despite the existence of RFCs and other standards, implementations often have subtle differences and bugs. One reason for this is that the specifications are typically informal, and hence inevitably contain ambiguities. Conformance testing against such specifications is challenging. In this paper we present a practical technique for rigorous protocol specification that supports specificationbased testing. We have applied it to TCP, UDP, and the Sockets API, developing a detailed ‘post-hoc’ specification that accurately reflects the behaviour of several existing implementations (FreeBSD 4.6, Linux 2.4.20-8, and Windows XP SP1). The development process uncovered a number of differences between and infelicities in these implementations. Our experience shows for the first time that rigorous specification is feasible for protocols as complex as TCP. We argue that the technique is also applicable ‘prehoc’, in the design phase of new protocols. We discuss how such a design-for-test approach should influence protocol development, leading to protocol specifications that are both unambiguous and clear, and to high-quality implementations that can be tested directly against those specifications. 1
Engineering with Logic: HOL Specification and Symbolic-Evaluation Testing for TCP Implementations
- POPL'06
, 2006
"... The TCP/IP protocols and Sockets API underlie much of modern computation, but their semantics have historically been very complex and ill-defined. The real standard is the de facto one of the common implementations, including, for example, the 15 000-- 20 000 lines of C in the BSD implementation. De ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
The TCP/IP protocols and Sockets API underlie much of modern computation, but their semantics have historically been very complex and ill-defined. The real standard is the de facto one of the common implementations, including, for example, the 15 000-- 20 000 lines of C in the BSD implementation. Dealing rigorously with the behaviour of such bodies of code is challenging. We have
Memory Models: A Case for Rethinking Parallel Languages and Hardware
- COMMUNICATIONS OF THE ACM
, 2010
"... The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of share ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of shared variables, is at the heart of this programming model. Unfortunately, it has involved a tradeoff between programmability and performance, and has arguably been one of the most challenging and contentious areas in both hardware architecture and programming language specification. Recent broad community-scale efforts have finally led to a convergence in this debate, with popular languages such as Java and C++ and most hardware vendors publishing compatible memory model specifications. Although this convergence is a dramatic improvement, it has exposed fundamental shortcomings in current
Destructors, Finalizers, and Synchronization
- In ACM Symposium on Principles of Programming Languages. ACM
, 2003
"... We compare two di#erent facilities for running cleanup actions for objects that are about to reach the end of their life. ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
We compare two di#erent facilities for running cleanup actions for objects that are about to reach the end of their life.
An Almost Non-Blocking Stack
- In Proceedings of the Twenty-third Annual ACM Symposium on Principles of Distributed Computing
, 2004
"... Non-blocking data structure implementations can be useful for performance and fault-tolerance reasons. And they are far easier to use correctly in a signal- or interrupt-handler context. ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Non-blocking data structure implementations can be useful for performance and fault-tolerance reasons. And they are far easier to use correctly in a signal- or interrupt-handler context.
Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races
, 2010
"... We argue in this paper that concurrency errors should be treated as exceptions, i.e., have fail-stop behavior and precise semantics. We propose an exception model based on conflict of synchronizationfree regions, which precisely detects a broad class of data-races. We show that our exceptions provid ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
We argue in this paper that concurrency errors should be treated as exceptions, i.e., have fail-stop behavior and precise semantics. We propose an exception model based on conflict of synchronizationfree regions, which precisely detects a broad class of data-races. We show that our exceptions provide enough guarantees to simplify high-level programming language semantics and debugging, but are significantly cheaper to enforce than traditional data-race detection. To make the performance cost of enforcement negligible, we propose architecture support for accurately detecting and precisely delivering these exceptions. We evaluate the suitability of our model as well as the behavior of our architectural mechanisms using the PARSEC benchmark suite and commercial applications. Our results show that the exception model largely reflects how programmers are already writing code and that the main memory, traffic and performance overheads of the enforcement mechanisms we propose are very low.
Reordering Constraints for Pthread-Style Locks
, 2005
"... threads, locks, memory barriers, memory fences, code reordering, data race, pthreads, optimization C or C++ programs relying on the pthreads interface for concurrency are required to use a specified set of functions to avoid data races, and to ensure memory visibility across threads. Although the de ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
threads, locks, memory barriers, memory fences, code reordering, data race, pthreads, optimization C or C++ programs relying on the pthreads interface for concurrency are required to use a specified set of functions to avoid data races, and to ensure memory visibility across threads. Although the detailed rules are not completely clear[10], it is not hard to refine them to a simple set of clear and uncontroversial rules for at least a subset of the C language that excludes structures (and hence bit-fields). We precisely address the question of how locks in this subset must be implemented, and particularly when other memory operations can be reordered with respect to locks. This impacts the memory fences required in lock implementations, and hence has significant performance impact. Along the way, we show that a significant class of common compiler transformations are actually safe in the presence of pthreads, something which appears to have received minimal attention in the past. We show that, surprisingly to us, the reordering constraints are not symmetric for the lock and unlock operations. In particular, it is not always safe to move memory operations into a locked region by delaying them past a pthread mutex lock() call, but it is safe to move them into such a region by advancing them to before a pthread mutex unlock() call. We believe that this was not previously recognized, and there is evidence that it is under appreciated among implementors of thread libraries. Although our precise arguments are expressed in terms of statement reordering within a small subset language, we believe that our results capture the situation for a full C/C++ implementation. We also argue that our results are insensitive to the details of our literal (and reasonable, though possibly unintended) interpretation of the pthread standard. We believe that they accurately reflect hardware memory ordering constraints in addition to compiler constraints. And they appear to have implications beyond pthread environments.
Extended Sequential Reasoning for Data-Race-Free Programs ∗
"... Most multithreaded programming languages prohibit or discourage data races. By avoiding data races, we are guaranteed that variables accessed within a synchronization-free code region cannot be modified by other threads, allowing us to reason about such code regions as though they were single-thread ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Most multithreaded programming languages prohibit or discourage data races. By avoiding data races, we are guaranteed that variables accessed within a synchronization-free code region cannot be modified by other threads, allowing us to reason about such code regions as though they were single-threaded. However, such single-threaded reasoning is not limited to synchronization-free regions. We present a simple characterization of extended interference-free regions in which variables cannot be modified by other threads. This characterization shows that, in the absence of data races, important code analysis problems often have surprisingly easy answers. For instance, we can use local analysis to determine when lock and unlock calls refer to the same mutex. Our characterization can be derived from prior work on safe compiler transformations, but it can also be simply derived from first principles, and justified in a very broad context. In addition, systematic reasoning about overlapping interference-free regions yields insight about optimization opportunities that were not previously apparent. We give preliminary results for a prototype implementation of interference-free regions in the LLVM compiler infrastructure. We also discuss other potential applications for interference-free regions.

