Results 11 - 20
of
194
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
- In ISCA ’07: Proceedings of the 34th annual international symposium on Computer architecture
, 2007
"... ABSTRACT Chip multiprocessors (CMPs) are now commonplace, and the num-ber of cores on a CMP is likely to grow steadily. However, in order ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
ABSTRACT Chip multiprocessors (CMPs) are now commonplace, and the num-ber of cores on a CMP is likely to grow steadily. However, in order
stateless model checking
- In PLDI 08: Programming Language Design and Implementation
, 2008
"... Stateless model checking is a useful state-space exploration technique for systematically testing complex real-world software. Existing stateless model checkers are limited to the verification of safety properties on terminating programs. However, realistic concurrent programs are nonterminating, a ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Stateless model checking is a useful state-space exploration technique for systematically testing complex real-world software. Existing stateless model checkers are limited to the verification of safety properties on terminating programs. However, realistic concurrent programs are nonterminating, a property that significantly reduces the efficacy of stateless model checking in testing them. Moreover, existing stateless model checkers are unable to verify that a nonterminating program satisfies the important liveness property of livelock-freedom, a property that requires the program to make continuous progress for any input. To address these shortcomings, this paper argues for incorporating a fair scheduler in stateless exploration. The key contribution of this paper is an explicit scheduler that is (strongly) fair and at the same time sufficiently nondeterministic to guarantee full coverage of safety properties. We have implemented the fair scheduler in the CHESS model checker. We show through theoretical arguments and empirical evaluation that our algorithm satisfies two important properties: 1) it visits all states of a finite-state program achieving state coverage at a faster rate than existing techniques, and 2) it finds all livelocks in a finite-state program. Before this work, nonterminating programs had to be manually modified in order to apply CHESS to them. The addition of fairness has allowed CHESS to be effectively applied to real-world nonterminating programs without any modification. For example, we have successfully booted the Singularity operating system under the control of CHESS. Categories and Subject Descriptors D.2.4 [Software Engineering]: Software/Program Verification — formal methods, validation;
Transparent proxies for java futures
- In Proceedings of the nineteenth annual
, 2004
"... A proxy object is a surrogate or placeholder that controls access to another target object. Proxies can be used to support distributed programming, lazy or parallel evaluation, access control, and other simple forms of behavioral reflection. However, wrapper proxies (like futures or suspensions for ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
A proxy object is a surrogate or placeholder that controls access to another target object. Proxies can be used to support distributed programming, lazy or parallel evaluation, access control, and other simple forms of behavioral reflection. However, wrapper proxies (like futures or suspensions for yet-to-be-computed results) can require significant code changes to be used in statically-typed languages, while proxies more generally can inadvertently violate assumptions of transparency, resulting in subtle bugs. To solve these problems, we have designed and implemented a simple framework for proxy programming that employs a static analysis based on qualifier inference, but with additional novelties. Code for using wrapper proxies is automatically introduced via a classfile-to-classfile transformation, and potential violations of transparency are signaled to the programmer. We have formalized our analysis and proven it sound. Our framework has a variety of applications, including support for asynchronous method calls returning futures. Experimental results demonstrate the benefits of our framework: programmers are relieved of managing and/or checking proxy usage, analysis times are reasonably fast, overheads introduced by added dynamic checks are negligible, and performance improvements can be significant. For example, changing two lines in a simple RMI-based peer-to-peer application and then using our framework resulted in a large performance gain. 1
STAPL: An adaptive, generic parallel C++ library
- IN INT. WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2001, [CDROM
, 2003
"... The Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed me ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
The Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism for applications which use dynamically linked data structures such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms. STAPL provides several different algorithms for some library routines, and selects among them adaptively at runtime. STAPL can replace STL automatically by invoking a preprocessing translation phase. In the applications studied, the performance of translated code was within 5 % of the results obtained using STAPL directly. STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.
The Performance of Work Stealing in Multiprogrammed Environments
- IN PROCEEDINGS OF THE 1998 ACM SIGMETRICS INTERNATIONAL CONFERENCE ON MEASUREMENT AND MODELING OF COMPUTER SYSTEMS, POSTER SESSION
, 1997
"... We study the performance of user-level thread schedulers in multiprogrammed environments. Our goal is a user-level thread scheduler that delivers efficient performance under multiprogramming without any need for kernel-level resource management, such as coscheduling or process control. We show that ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
We study the performance of user-level thread schedulers in multiprogrammed environments. Our goal is a user-level thread scheduler that delivers efficient performance under multiprogramming without any need for kernel-level resource management, such as coscheduling or process control. We show that a non-blocking implementation of the work-stealing algorithm achieves this goal. With this implementation, the execution time of a computation running with arbitrarily many processes on arbitrarily many processors can be modeled as a simple function of work and critical-path length. This model holds even when the processes run on a set of processors that arbitrarily grows and shrinks over time. We observe linear speedup whenever the number of processes is small relative to the average parallelism.
Scheduling Threads for Low Space Requirement and Good Locality
- In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA
, 1999
"... The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overh ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
The running time and memory requirement of a parallel program with dynamic, lightweight threads depends heavily on the underlying thread scheduler. In this paper, we present a simple, asynchronous, space-efficient scheduling algorithm for shared memory machines that combines the low scheduling overheads and good locality of work stealing with the low space requirements of depth-first schedulers. For a nested-parallel program with depth D and serial space requirement S 1 , we show that the expected space requirement is S 1 +O(K \Delta p \Delta D) on p processors. Here, K is a user-adjustable runtime parameter, which provides a tradeoff between running time and space requirement. Our algorithm achieves good locality and low scheduling overheads by automatically increasing the granularity of the work scheduled on each processor. We have implemented the new scheduling algorithm in the context of a native, user-level implementation of Posix standard threads or Pthreads, and evaluated its p...
Portable High-Performance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
right notice and this permission notice are preserved on all copies.
Memory Models: A Case for Rethinking Parallel Languages and Hardware
- COMMUNICATIONS OF THE ACM
, 2010
"... The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of share ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
The era of parallel computing for the masses is here, but writing correct parallel programs remains far more difficult than writing sequential programs. Aside from a few domains, most parallel programs are written using a shared-memory approach. The memory model, which specifies the meaning of shared variables, is at the heart of this programming model. Unfortunately, it has involved a tradeoff between programmability and performance, and has arguably been one of the most challenging and contentious areas in both hardware architecture and programming language specification. Recent broad community-scale efforts have finally led to a convergence in this debate, with popular languages such as Java and C++ and most hardware vendors publishing compatible memory model specifications. Although this convergence is a dramatic improvement, it has exposed fundamental shortcomings in current
Phoenix : a Parallel Programming Model for Accommodating Dynamically Joining Resources
- In Proc. of PPoPP
, 2003
"... This paper proposes Phoenix, a programming model for writing parallel and distributed applications that accommodate dynamically joining/leaving compute resources. In the proposed model, nodes involved in an application see a large and fixed virtual node name space. They communicate via messages, who ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper proposes Phoenix, a programming model for writing parallel and distributed applications that accommodate dynamically joining/leaving compute resources. In the proposed model, nodes involved in an application see a large and fixed virtual node name space. They communicate via messages, whose destinations are specified by virtual node names, rather than names bound to a physical resource. We describe Phoenix API and show how it allows a transparent migration of application states, as well as dynamically joining/leaving nodes as its by-product. We also demonstrate through several application studies that Phoenix model is close enough to regular message passing, thus it is a general programming model that facilitates porting many parallel applications/algorithms to more dynamic environments. Experimental results indicate applications that have a small task migration cost can quickly take advantage of dynamically joining resources using Phoenix. Divide-and-conquer algorithms written in Phoenix achieved a good speedup with a large number of nodes across multiple LANs (120 times speedup using 169 CPUs across three LANs). We believe Phoenix provides a useful programming abstraction and platform for emerging parallel applications that must be deployed across multiple LANs and/or shared clusters having dynamically varying resource conditions.
Optimistic Evaluation: An Adaptive Evaluation Strategy for Non-Strict Programs
- In ICFP ’03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
, 2003
"... Lazy programs are beautiful, but they are slow because they build many thunks. Simple measurements show that most of these thunks are unnecessary: they are in fact always evaluated, or are always cheap. In this paper we describe Optimistic Evaluation --- an evaluation strategy that exploits this obs ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Lazy programs are beautiful, but they are slow because they build many thunks. Simple measurements show that most of these thunks are unnecessary: they are in fact always evaluated, or are always cheap. In this paper we describe Optimistic Evaluation --- an evaluation strategy that exploits this observation. Optimistic Evaluation complements compile-time analyses with run-time experiments: it evaluates a thunk speculatively, but has an abortion mechanism to back out if it makes a bad choice. A run-time adaption mechanism records expressions found to be unsuitable for speculative evaluation, and arranges for them to be evaluated more lazily in the future.

