Results 1 - 10
of
29
the Parallel Computing Landscape
"... contributed articles doi:10.1145/1562764.1562783 Writing programs that scale with increasing numbers of cores should be as easy as writing programs for sequential computers. ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
contributed articles doi:10.1145/1562764.1562783 Writing programs that scale with increasing numbers of cores should be as easy as writing programs for sequential computers.
Colorama: Architectural support for data-centric synchronization
- In Proc. of the 13th International Symposium on High-Performance Computer Architecture
, 2007
"... With the advent of ubiquitous multi-core architectures, a major challenge is to simplify parallel programming. One way to tame one of the main sources of programming complexity, namely synchronization, is transactional memory (TM). However, we argue that TM does not go far enough, since the programm ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
With the advent of ubiquitous multi-core architectures, a major challenge is to simplify parallel programming. One way to tame one of the main sources of programming complexity, namely synchronization, is transactional memory (TM). However, we argue that TM does not go far enough, since the programmer still needs nonlocal reasoning to decide where to place transactions in the code. A significant improvement to the art is Data-Centric Synchronization (DCS), where the programmer uses local reasoning to assign synchronization constraints to data. Based on these, the system automatically infers critical sections and inserts synchronization operations. This paper proposes novel architectural support to make DCS feasible, and describes its programming model and interface. The proposal, called Colorama, needs only modest hardware extensions, supports general-purpose, pointer-based languages such as C/C++ and, in our opinion, can substantially simplify the task of writing new parallel programs. 1.
Global Principal Typing in Partially Commutative Asynchronous Sessions
"... Abstract. We generalise a theory of multiparty session types for the π-calculus through asynchronous communication subtyping, which allows partial commutativity of actions with maximal flexibility and safe optimisation in message choreography. A sound and complete algorithm for the subtyping relatio ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
Abstract. We generalise a theory of multiparty session types for the π-calculus through asynchronous communication subtyping, which allows partial commutativity of actions with maximal flexibility and safe optimisation in message choreography. A sound and complete algorithm for the subtyping relation, which can calculate conformance of optimised end-point processes to an agreed global specification, is presented. As a complementing result, we show a type inference algorithm for deriving the principal global specification from end-point processes which is minimal with respect to subtyping. The resulting theory allows a programmer to choose between a top-down and a bottom-up style of communication programming, ensuring the same desirable properties of typable processes. 1
The Tao of Parallelism in Algorithms
- In PLDI
, 2011
"... For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in “regular ” algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph i ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in “regular ” algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are “irregular ” data structures like graphs, trees, and sets. To address the need for better abstractions, we introduce a datacentric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties
How do programs become more concurrent? A story of program transformations.
, 2008
"... For several decades, programmers have relied on Moore’s Law to improve the performance of their software applications. From now on, programmers need to program the multi-cores if they want to deliver efficient code. In the multi-core era, a major maintenance task will be to make sequential programs ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
For several decades, programmers have relied on Moore’s Law to improve the performance of their software applications. From now on, programmers need to program the multi-cores if they want to deliver efficient code. In the multi-core era, a major maintenance task will be to make sequential programs more concurrent. What are the most common transformations to retrofit concurrency into sequential programs? We studied the source code of 5 open-source Java projects. We analyzed qualitatively and quantitatively the change patterns that developers have used in order to retrofit concurrency. We found that these transformations belong to four categories: transformations that improve the latency, the throughput, the scalability, or correctness of the applications. In addition, we report on our experience of parallelizing one of our own programs. Our findings can educate software developers on how to parallelize sequential programs, and can provide hints for tool vendors about what transformations are worth automating. 1
W.F.: High-level Multicore Programming With XJava
- In: Comp. ICSE 2009, New Ideas And Emerging Results. ACM
, 2009
"... Multicore chips are becoming mainstream, but programming them is difficult because the prevalent thread-based programming model is error-prone and does not scale well. To address this problem, we designed XJava, an extension of Java that permits the direct expression of producer/consumer, pipeline, ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Multicore chips are becoming mainstream, but programming them is difficult because the prevalent thread-based programming model is error-prone and does not scale well. To address this problem, we designed XJava, an extension of Java that permits the direct expression of producer/consumer, pipeline, master/slave, and data parallelism. The central concept of the extension is the task, a parallel activity similar to a filter in Unix. Tasks can be combined with new operators to create arbitrary nestings of parallel activities. Preliminary experience with XJava and its compiler suggests that the extensions lead to code savings and reduce the potential for synchronization defects, while preserving the advantages of object-orientation and type-safety. The proposed extensions provide intuitive “what-you-seeis-what-you-get” parallelism. They also enable other software tools, such as auto-tuning and accurate static analysis for race detection. 1.
Amorphous Data-parallelism in Irregular Algorithms ∗
"... Most client-side applications running on multicore processors are likely to be irregular programs that deal with complex, pointerbased data structures such as large sparse graphs and trees. However, we understand very little about the nature of parallelism in irregular algorithms, let alone how to e ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Most client-side applications running on multicore processors are likely to be irregular programs that deal with complex, pointerbased data structures such as large sparse graphs and trees. However, we understand very little about the nature of parallelism in irregular algorithms, let alone how to exploit it effectively on multicore processors. In this paper, we show that, although the behavior of irregular algorithms can be very complex, many of them have a generalized data-parallelism that we call amorphous data-parallelism. The algorithms in our study come from a variety of important disciplines such as data-mining, AI, compilers, networks, and scientific computing. We also argue that these algorithms can be divided naturally into a small number of categories, and that this categorization provides a lot of insight into their behavior. Finally, we discuss how these insights should guide programming language support and parallel system implementation for irregular algorithms. 1.
Deferring Design Pattern Decisions and Automating Structural Pattern Changes using a Design-Pattern-Based Programming System
"... In the design phase of software development, the designer must make many fundamental design decisions concerning the architecture of the system. Incorrect decisions are relatively easy and inexpensive to fix if caught during the design process, but the difficulty and cost rise significantly if probl ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In the design phase of software development, the designer must make many fundamental design decisions concerning the architecture of the system. Incorrect decisions are relatively easy and inexpensive to fix if caught during the design process, but the difficulty and cost rise significantly if problems are not found until after coding begins. Unfortunately, it is not always possible to find incorrect design decisions during the design phase. To reduce the cost of expensive corrections, it would be useful to have the ability to defer some design decisions as long as possible, even into the coding stage. Failing that, tool support for automating design changes would give more freedom to revisit and change these decisions when needed. This paper shows how a designpattern-based programming system based on generative design patterns can support the deferral of design decisions where possible, and automate changes where necessary. A generative design pattern is a parameterized pattern form that is capable of generating code for different versions of the underlying design pattern. We demonstrate these ideas in the context of a parallel application written with the CO2P3S pattern-based parallel programming system. We show that CO2P3S can defer the choice of execution architecture (shared-memory or distributed-memory), and can automate several changes to the application structure that would normally be daunting to tackle
P.: Co-design of distributed systems using skeletons and autonomic management abstractions
- EuroPar 2008 Workshops. LNCS
, 2009
"... Abstract. We discuss how common problems arising with multi/manycore distributed architectures can be effectively handled through co-design of parallel/distributed programming abstractions and of autonomic management of non-functional concerns. In particular, we demonstrate how restricted parallel/d ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. We discuss how common problems arising with multi/manycore distributed architectures can be effectively handled through co-design of parallel/distributed programming abstractions and of autonomic management of non-functional concerns. In particular, we demonstrate how restricted parallel/distributed patterns (or skeletons) may be efficiently managed by rule-based autonomic managers. We discuss the basic principles underlying pattern+manager co-design, current implementations inspired by this approach and some results achieved with a proof-of-concept prototype.
Xjava: Exploiting parallelism with objectoriented stream programming
- In Euro-Par 2009, volume 5704 of LNCS
, 2009
"... Abstract. This paper presents the XJava compiler for parallel programs. It exploits parallelism based on an object-oriented stream programming paradigm. XJava extends Java with new parallel constructs that do not expose programmers to low-level details of parallel programming on shared memory machin ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. This paper presents the XJava compiler for parallel programs. It exploits parallelism based on an object-oriented stream programming paradigm. XJava extends Java with new parallel constructs that do not expose programmers to low-level details of parallel programming on shared memory machines. Tasks define composable parallel activities, and new operators allow an easier expression of parallel patterns, such as pipelines, divide and conquer, or master/worker. We also present an automatic run-time mechanism that extends our previous work to automatically map tasks and parallel statements to threads. We conducted several case studies with an open source desktop search application and a suite of benchmark programs. The results show that XJava reduces the opportunities to introduce synchronization errors. Compared to threaded Java, the amount of code could be reduced by up to 39%. The run-time mechanism helped reduce effort for performance tuning and achieved speedups up to 31.5 on an eight core machine. 1

