Results 1 - 10
of
13
A provable time and space efficient implementation of nesl
- In International Conference on Functional Programming
, 1996
"... In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed J-calculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementa ..."
Abstract
-
Cited by 60 (7 self)
- Add to MetaCart
In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed J-calculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard call-by-value operational semantics to return two cost measures: a DAG representing the sequential dependence in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM usin O(w/p + d log p) time and 0(s + dp logp) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linew speedup and use space within a constant factor of the sequential space. 1
Space-Efficient Scheduling of Parallelism with Synchronization Variables
"... Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchr ..."
Abstract
-
Cited by 28 (10 self)
- Add to MetaCart
Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchronize with ancestor or sibling threads. Such computations do not include languages with futures or user-specified synchronization constraints. Here we extend the results to languages with synchronization variables. Such languages include languages with futures, such as Multilisp and Cool, as well as other languages such asid. The main result is an online scheduling algorithm which, given a computation with w work (total operations), synchronizations, d depth (critical path) and s1 sequential space, will run in O(w=p + log(pd)=p + d log(pd)) time and s1 + O(pd log(pd)) space, on a p-processor crcw pram with a fetch-and-add primitive. This includes all time and space costs for both the computation and the scheduler. The scheduler is non-preemptive in the sense that it will only move a thread if the thread suspends on a synchronization, forks a new thread, or exceeds a threshold when allocating space. For the special case where the computation is a planar graph with left-to-right synchronization edges, the scheduling algorithm can be implemented in O(w=p+d log p) time and s1 + O(pd log p) space. These are the first nontrivial space bounds described for such languages.
From Sequential Programs to Multi-Tier Applications by Program Transformation
, 2005
"... Modern applications are designed in multiple tiers to separate concerns. Since each tier may run at a separate location, middleware is required to mediate access between tiers. However, introducing this middleware is tiresome and error-prone. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Modern applications are designed in multiple tiers to separate concerns. Since each tier may run at a separate location, middleware is required to mediate access between tiers. However, introducing this middleware is tiresome and error-prone.
Space-profiling semantics of the call-by-value lambda calculus and the CPS transformation
- In The 3rd International Workshop on Higher Order Operational Techniques in Semantics, volume 26 of Electronic Notes in Theoretical Computer Science
, 1999
"... We show that the CPS transformation from the call-by-value lambda calculus to a CPS language preserves space required for execution of a program within a constant factor. For the call-by-value lambda calculus we adopt a space-profiling semantics based on the profiling semantics of NESL by Blelloch a ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We show that the CPS transformation from the call-by-value lambda calculus to a CPS language preserves space required for execution of a program within a constant factor. For the call-by-value lambda calculus we adopt a space-profiling semantics based on the profiling semantics of NESL by Blelloch and Greiner. However, we have noticed their semantics has some inconsistency between the treatments of stack space and heap space. This requires us to revise the semantics so that the semantics treats space in more consistent manner in order to obtain our result. 1
Pipelining with Futures
, 1997
"... Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 2-3 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 2-3 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based on the AKS sorting network, and has since been used to improve the time of several other algorithms. Although the approach has improved the asymptotic time of many algorithms, there are two practical problems: maintaining the pipeline is quite complicated for the programmer, and the pipelining forces highly synchronous code execution. Synchronous execution is less practical on asynchronous machines and makes it difficult to modify a schedule to use less memory or to take better advantage of locality.
A new criterion for safe program transformations
- In Proceedings of the Forth International Workshop on Higher Order Operational Techniques in Semantics (HOOTS), volume 41(3) of ENTCS
, 2000
"... Previous studies on safety of program transformations with respect to performance considered two criteria: preserving performance within a constant factor and preserving complexity. However, as the requirement of program transformations used in compilers the former seems too restrictive and the latt ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Previous studies on safety of program transformations with respect to performance considered two criteria: preserving performance within a constant factor and preserving complexity. However, as the requirement of program transformations used in compilers the former seems too restrictive and the latter seems too loose. We propose a new safety criterion: a program transformation preserves performance within a factor proportional to the size of a source program. This criterion seems natural since several compilation methods have effects on performance proportional to the size of a program. Based on this criterion we have shown that two semantics formalizing the size of stack space are equivalent. We also discuss the connection between this criterion and the properties of local program transformations rewriting parts of a program. 1
Low-Contention Depth-First Scheduling of Parallel Computations with Write-Once Synchronization Variables
- In Proc. 13th ACM Symp. on Parallel Algorithms and Architectures (SPAA
, 2001
"... We present an efficient, randomized, online, scheduling algorithm for a large class of programs with write-once synchronization variables. The algorithm combines the workstealing paradigm with the depth-first scheduling technique, resulting in high space efficiency and good time complexity. By auto ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present an efficient, randomized, online, scheduling algorithm for a large class of programs with write-once synchronization variables. The algorithm combines the workstealing paradigm with the depth-first scheduling technique, resulting in high space efficiency and good time complexity. By automatically increasing the granularity of the work scheduled on each processor, our algorithm achieves good locality, low contention and low scheduling overhead, improving upon a previous depth-first scheduling algorithm [6] published in SPAA'97. Moreover, it is provably efficient for the general class of multithreaded computations with writeonce synchronization variables (as studied in [6]), improving upon algorithm DFDeques (published in SPAA'99 [24]), which is only for the more restricted class of nested parallel computations. More specifically, consider such a computation with work T1 , depth T1 and oe synchronizations, and suppose that space S1 suffices to execute the computation on a singleprocessor computer. Then, on a P-processor shared-memory parallel machine, the expected space complexity of our algorithm is at most S1 +O(PT1 log(PT1 )), and its expected time complexity is O(T1=P+oe log(PT1)=P+T1 log(PT1 )). Moreover, for any ffl ? 0, the space complexity of our algorithm is S1 + O(P (T1 + ln(1=ffl)) log(P (T1 + ln(P (T1 + ln(1=ffl))=ffl)))) with probability at least 1 \Gamma ffl. Thus, even for values of ffl as small as e \GammaT 1 , the space complexity of our algorithm is at most S1 +O(PT1 log(PT1 )) with probability at least 1 \Gamma e \GammaT 1 . These bounds include all time and space costs for both the computation and the scheduler. 1
Speculative multithreading in a Java virtual machine
, 2005
"... Speculative multithreading (SpMT) is a dynamic program parallelisation technique that promises dramatic speedup of irregular, pointer-based programs as well as numerical, loop-based programs. We present the design and implementation of software-only SpMT for Java at the virtual machine level. We tak ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Speculative multithreading (SpMT) is a dynamic program parallelisation technique that promises dramatic speedup of irregular, pointer-based programs as well as numerical, loop-based programs. We present the design and implementation of software-only SpMT for Java at the virtual machine level. We take the full Java language into account and we are able to run and analyse real world benchmarks in reasonable execution times on commodity multiprocessor hardware. We provide an experimental analysis of benchmark behaviour, uncovered parallelism, the impact of return value prediction, processor scalability, and a breakdown of overhead costs. 1
Understanding method level speculation
- Sable Research Group, School of Computer Science, McGill University
, 2009
"... Method level speculation (MLS) is an optimistic technique for parallelizing imperative programs, for which a variety of MLS systems and optimizations have been proposed. However, runtime performance strongly depends on the interaction between program structure and MLS system design choices, making i ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Method level speculation (MLS) is an optimistic technique for parallelizing imperative programs, for which a variety of MLS systems and optimizations have been proposed. However, runtime performance strongly depends on the interaction between program structure and MLS system design choices, making it difficult to compare approaches or understand in a general way how programs behave under MLS. Here we develop an abstract list-based model of speculative execution that encompasses several MLS designs, and a concrete stack-based model that is suitable for implementations. Using our abstract model, we show equivalence and correctness for a variety of MLS designs, unifying in-order and out-of-order execution models. Using our concrete model, we directly explore the execution behaviour of simple imperative programs, and show how specific parallelization patterns are induced by combining common programming idioms with precise speculation decisions. This basic groundwork establishes a common basis for understanding MLS designs, and suggests more formal directions for optimizing MLS behaviour and application. 1.
Scheduling Deterministic Parallel Programs
, 2009
"... are those of the author and should not be interpreted as representing the official policies, either expressed or implied, Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reasoning about the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
are those of the author and should not be interpreted as representing the official policies, either expressed or implied, Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reasoning about the correctness of these programs. However, the performance of parallel programs still depends upon this assignment of tasks, as determined by a part of the language implementation called the scheduling policy. In this thesis, I define a novel cost semantics for a parallel language that enables programmers to reason formally about different scheduling policies. This cost semantics forms a basis for a suite of prototype profiling tools. These tools allow programmers to simulate and visualize program execution under different scheduling policies and understand how the choice of policy affects application memory use. My cost semantics also provides a specification for implementations of the language. As an example of such an implementation, I have extended MLton, a compiler

