• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

S.: Feedback directed implicit parallelism (2007)

by T Harris, Singh
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 14
Next 10 →

Runtime Support for Multicore Haskell

by Simon Marlow, Simon Peyton Jones, Satnam Singh
"... Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describ ..."
Abstract - Cited by 18 (5 self) - Add to MetaCart
Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describe just such an implementation, and quantitatively explore some of the complex design tradeoffs that make such implementations hard to build. Our measurements are necessarily detailed and specific, but they are reproducible, and we believe that they offer some general insights. 1.

How much parallelism is there in irregular applications

by Milind Kulkarni, Martin Burtscher, Rajasekhar Inkulu, Keshav Pingali - In PPoPP , 2009
"... Irregular programs are programs organized around pointer-based data structures such as trees and graphs. Recent investigations by the Galois project have shown that many irregular programs have a generalized form of data-parallelism called amorphous data-parallelism. However, in many programs, amorp ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
Irregular programs are programs organized around pointer-based data structures such as trees and graphs. Recent investigations by the Galois project have shown that many irregular programs have a generalized form of data-parallelism called amorphous data-parallelism. However, in many programs, amorphous dataparallelism cannot be uncovered using static techniques, and its exploitation requires runtime strategies such as optimistic parallel execution. This raises a natural question: how much amorphous data-parallelism actually exists in irregular programs? In this paper, we describe the design and implementation of a tool called ParaMeter that produces parallelism profiles for irregular programs. Parallelism profiles are an abstract measure of the amount of amorphous data-parallelism at different points in the execution of an algorithm, independent of implementation-dependent details such as the number of cores, cache sizes, load-balancing, etc. ParaMeter can also generate constrained parallelism profiles for a fixed number of cores. We show parallelism profiles for seven irregular applications, and explain how these profiles provide insight into the behavior of these applications.

Seq no more: Better Strategies for Parallel Haskell

by Simon Marlow, Mustafa K Aswad, Patrick Maier, Phil Trinder, Hans-wolfgang Loidl
"... We present a complete redesign of Evaluation Strategies, a key abstraction for specifying pure, deterministic parallelism in Haskell. Our new formulation preserves the compositionality and modularity benefits of the original, while providing significant new benefits. First, we introduce an evaluatio ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
We present a complete redesign of Evaluation Strategies, a key abstraction for specifying pure, deterministic parallelism in Haskell. Our new formulation preserves the compositionality and modularity benefits of the original, while providing significant new benefits. First, we introduce an evaluation-order monad to provide clearer, more generic, and more efficient specification of parallel evaluation. Secondly, the new formulation resolves a subtle space management issue with the original strategies, allowing parallelism (sparks) to be preserved while reclaiming heap associated with superfluous parallelism. Related to this, the new formulation provides far better support for speculative parallelism as the garbage collector now prunes unneeded speculation. Finally, the new formulation provides improved compositionality: we can directly express parallelism embedded within lazy data structures, producing more compositional strategies, and our basic strategies are parametric in the coordination combinator, facilitating a richer set of parallelism combinators. We give measurements over a range of benchmarks demonstrating that the runtime overheads of the new formulation relative to the original are low, and the new strategies even yield slightly better speedups on average than the original strategies.

Profiling java programs for parallelism

by Clemens Hammacher, Kevin Streit, Sebastian Hack, Andreas Zeller - In Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering, IWMSE ’09 , 2009
"... One of the biggest challenges imposed by multi-core architectures is how to exploit their potential for legacy systems not built with multiple cores in mind. By analyzing dynamic data dependences of a program run, one can identify independent computation paths that could have been handled by individ ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
One of the biggest challenges imposed by multi-core architectures is how to exploit their potential for legacy systems not built with multiple cores in mind. By analyzing dynamic data dependences of a program run, one can identify independent computation paths that could have been handled by individual cores. Our prototype computes dynamic dependences for Java programs and recommends locations to the programmer with the highest potential for parallelization. Such measurements can also provide starting points for automatic, speculative parallelization. 1.

Memory abstractions for speculative multithreading

by Christopher J. F. Pickett, Clark Verbrugge, Allan Kielstra , 2008
"... Speculative multithreading is a promising technique for automatic parallelization. However, our experience with a software implementation indicates that there is significant overhead involved in managing the heap memory internal to the speculative system and significant complexity in correctly manag ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Speculative multithreading is a promising technique for automatic parallelization. However, our experience with a software implementation indicates that there is significant overhead involved in managing the heap memory internal to the speculative system and significant complexity in correctly managing the call stack memory used by speculative tasks. We first describe a specialized heap management system that allows the data structures necessary to support speculation to be quickly recycled. We take advantage of constraints imposed by our speculation model to reduce the complexity of this otherwise general producer/consumer memory problem. We next provide a call stack abstraction that allows the local variable requirements of in-order and out-of-order nested method level speculation to be expressed and hence implemented in a relatively simple fashion. We believe that heap and stack memory management issues are important to efficient speculative system design. We also believe that abstractions such as ours help provide a framework for understanding the requirements of different speculation models. 1

Understanding method level speculation

by Christopher J. F. Pickett, Clark Verbrugge, Allan Kielstra, Christopher J. F. Pickett, Clark Verbrugge, Allan Kielstra - Sable Research Group, School of Computer Science, McGill University , 2009
"... Method level speculation (MLS) is an optimistic technique for parallelizing imperative programs, for which a variety of MLS systems and optimizations have been proposed. However, runtime performance strongly depends on the interaction between program structure and MLS system design choices, making i ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Method level speculation (MLS) is an optimistic technique for parallelizing imperative programs, for which a variety of MLS systems and optimizations have been proposed. However, runtime performance strongly depends on the interaction between program structure and MLS system design choices, making it difficult to compare approaches or understand in a general way how programs behave under MLS. Here we develop an abstract list-based model of speculative execution that encompasses several MLS designs, and a concrete stack-based model that is suitable for implementations. Using our abstract model, we show equivalence and correctness for a variety of MLS designs, unifying in-order and out-of-order execution models. Using our concrete model, we directly explore the execution behaviour of simple imperative programs, and show how specific parallelization patterns are induced by combining common programming idioms with precise speculation decisions. This basic groundwork establishes a common basis for understanding MLS designs, and suggests more formal directions for optimizing MLS behaviour and application. 1.

General Terms

by Armand Navabi, Xiangyu Zhang, Suresh Jagannathan
"... Migrating sequential programs to effectively utilize next generation multicore architectures is a key challenge facing application developers and implementors. Languages like Java that support complex control- and dataflow abstractions confound classical automatic parallelization techniques. On the ..."
Abstract - Add to MetaCart
Migrating sequential programs to effectively utilize next generation multicore architectures is a key challenge facing application developers and implementors. Languages like Java that support complex control- and dataflow abstractions confound classical automatic parallelization techniques. On the other hand, introducing multithreading and concurrency control explicitly into programs can impose a high conceptual burden on the programmer, and may entail a significant rewrite of the original program. In this paper, we consider a new technique to address this issue. Our approach makes use of futures, a simple annotation that introduces asynchronous concurrency into Java programs, but provides no concurrency control. To ensure concurrent execution does not yield behavior inconsistent with sequential execution (i.e., execution yielded by erasing all futures), we present a new interprocedural summary-based dataflow analysis. The analysis inserts lightweight barriers that block and resume threads executing futures if a dependency violation may ensue. There are no constraints on how threads execute other than those imposed by these barriers. Our experimental results indicate futures can be leveraged to transparently ensure safety and profitably exploit parallelism; in contrast to earlier efforts, our technique is completely portable, and requires no modifications to the underlying JVM. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features—Concurrent programming

Calculating likely Parallelism within Dependant Conjunctions for Logic Programs

by Paul Bone , 2008
"... The rate at which computers are becoming faster at sequential execution has dropped significantly. Instead parallel processing power is increasing, and multicore computers are becoming more common. Automatically parallelising programs is becoming much more desirable. Parallelising programs written i ..."
Abstract - Add to MetaCart
The rate at which computers are becoming faster at sequential execution has dropped significantly. Instead parallel processing power is increasing, and multicore computers are becoming more common. Automatically parallelising programs is becoming much more desirable. Parallelising programs written in imperative programming languages is difficult and often leads to unreliable software. Parallelising programs in declarative languages is easy, to the extent that compilers are able to do this automatically. However this often leads to cases where the overheads of parallel execution outweigh the speedup that might have been available by parallelising the program. This thesis describes a new implicit parallelism implementation that calculates the speedup due to parallelism in dependent conjunctions for Mercury — a purely declarative logic programming language. This is done by analysing profiling data and a representation of the program in order

Chapter 1 Low Pain

by Vs No Pain, Multi-core Haskells, M. Kh. Aswad, P. W. Trinder, A. D. Al Zain, G. J. Michaelson
"... Abstract: Multicores are becoming the dominant processor technology and functional languages are theoretically well suited to exploit them. In practice, however, implementing effective high level parallel functional languages is extremely challenging. This paper is the first programming and performa ..."
Abstract - Add to MetaCart
Abstract: Multicores are becoming the dominant processor technology and functional languages are theoretically well suited to exploit them. In practice, however, implementing effective high level parallel functional languages is extremely challenging. This paper is the first programming and performance comparison of functional multicore technologies and reports some of the first ever multicore results for two languages. As such it reflects the growing maturity of the field by systematically evaluating four parallel Haskell implementations on a common multicore architecture. The comparison contrasts the programming effort each language requires with the parallel performance delivered. The study uses 15 ’typical ’ programs to compare a ’no pain’, i.e. entirely implicit, parallel language with three ’low pain’, i.e. semi-explicit languages. The parallel Haskell implementations use different versions of GHC compiler technology, and hence the comparative performance metric is speedup which normalises against sequential performance. We ground the speedup comparisons by reporting both sequential and parallel runtimes and efficiencies for three of the languages. Our experiments focus on the number of programs improved, the absolute speedups delivered, the parallel scalability, and the program changes required to coordinate parallelism. The results are encouraging and, on occasion, surprising.

Compress-and-Conquer for Optimal Multicore Computing

by Zhijing G. Mou, Sinovate Llc, Hai Liu, Paul Hudak
"... We propose a programming paradigm called compress-and-conquer (CC) that leads to optimal performance on multicore platforms. Given a multicore system of p cores and a problem of size n, the problem is first reduced to p smaller problems, each of which can be solved independently of the others (the c ..."
Abstract - Add to MetaCart
We propose a programming paradigm called compress-and-conquer (CC) that leads to optimal performance on multicore platforms. Given a multicore system of p cores and a problem of size n, the problem is first reduced to p smaller problems, each of which can be solved independently of the others (the compression phase). From the solutions to the p problems, a compressed version of the same problem of size O(p) is deduced and solved (the global phase). The solution to the original problem is then derived from the solution to the compressed problem together with the solutions of the smaller problems (the expansion phase). The CC paradigm reduces the complexity of multicore programming by allowing the best-known sequential algorithm for a problem to be used in each of the three phases. In this paper we apply the CC paradigm to a range of problems including scan, nested scan, difference equations, banded linear systems, and linear tridiagonal systems. The performance of CC programs is analyzed, and their optimality and linear speedup are proven. Characteristics of the problem space subject to CC are formally examined, and we show that its computational power subsumes that of scan, nested scan, and mapReduce. The CC paradigm has been implemented in Haskell as a modular, higher-order function, whose constituent functions can be shared by seemingly unrelated problems. This function is compiled into low-level Haskell threads that run on a multicore machine, and performance benchmarks confirm the theoretical analysis. D.1.3 [Parallel Program-
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University