Results 1  10
of
17
A provable time and space efficient implementation of nesl
 In International Conference on Functional Programming
, 1996
"... In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed Jcalculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementa ..."
Abstract

Cited by 73 (8 self)
 Add to MetaCart
In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed Jcalculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard callbyvalue operational semantics to return two cost measures: a DAG representing the sequential dependence in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM usin O(w/p + d log p) time and 0(s + dp logp) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linew speedup and use space within a constant factor of the sequential space. 1
SpaceEfficient Scheduling of Parallelism with Synchronization Variables
"... Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchr ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
Recent work on scheduling algorithms has resulted in provable bounds on the space taken by parallel computations in relation to the space taken by sequential computations. The results for online versions of these algorithms, however, have been limited to computations in which threads can only synchronize with ancestor or sibling threads. Such computations do not include languages with futures or userspecified synchronization constraints. Here we extend the results to languages with synchronization variables. Such languages include languages with futures, such as Multilisp and Cool, as well as other languages such asid. The main result is an online scheduling algorithm which, given a computation with w work (total operations), synchronizations, d depth (critical path) and s1 sequential space, will run in O(w=p + log(pd)=p + d log(pd)) time and s1 + O(pd log(pd)) space, on a pprocessor crcw pram with a fetchandadd primitive. This includes all time and space costs for both the computation and the scheduler. The scheduler is nonpreemptive in the sense that it will only move a thread if the thread suspends on a synchronization, forks a new thread, or exceeds a threshold when allocating space. For the special case where the computation is a planar graph with lefttoright synchronization edges, the scheduling algorithm can be implemented in O(w=p+d log p) time and s1 + O(pd log p) space. These are the first nontrivial space bounds described for such languages.
From Sequential Programs to MultiTier Applications by Program Transformation
, 2005
"... Modern applications are designed in multiple tiers to separate concerns. Since each tier may run at a separate location, middleware is required to mediate access between tiers. However, introducing this middleware is tiresome and errorprone. ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Modern applications are designed in multiple tiers to separate concerns. Since each tier may run at a separate location, middleware is required to mediate access between tiers. However, introducing this middleware is tiresome and errorprone.
Space Profiling for Parallel Functional Programs
"... This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provi ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provides a means to reason about performance without requiring a detailed understanding of the compiler or runtime system. It also provides a specification for language implementers. This is critical in that it enables us to separate cleanly the performance of the application from that of the language implementation. Some aspects of the implementation can have significant effects on performance. Our cost semantics enables programmers to understand the impact of different scheduling policies yet abstracts away from many of the details of their implementations. We show applications where the choice of scheduling policy has asymptotic effects on space use. We explain these use patterns through a demonstration of our tools. We also validate our methodology by observing similar performance in our implementation of a parallel extension of Standard ML.
Pipelining with Futures
, 1997
"... Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 23 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementation of 23 trees. The approach was later used by Cole in the first O(lg n) time sorting algorithm on the PRAM not based on the AKS sorting network, and has since been used to improve the time of several other algorithms. Although the approach has improved the asymptotic time of many algorithms, there are two practical problems: maintaining the pipeline is quite complicated for the programmer, and the pipelining forces highly synchronous code execution. Synchronous execution is less practical on asynchronous machines and makes it difficult to modify a schedule to use less memory or to take better advantage of locality.
Spaceprofiling semantics of the callbyvalue lambda calculus and the CPS transformation
 In The 3rd International Workshop on Higher Order Operational Techniques in Semantics, volume 26 of Electronic Notes in Theoretical Computer Science
, 1999
"... We show that the CPS transformation from the callbyvalue lambda calculus to a CPS language preserves space required for execution of a program within a constant factor. For the callbyvalue lambda calculus we adopt a spaceprofiling semantics based on the profiling semantics of NESL by Blelloch a ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We show that the CPS transformation from the callbyvalue lambda calculus to a CPS language preserves space required for execution of a program within a constant factor. For the callbyvalue lambda calculus we adopt a spaceprofiling semantics based on the profiling semantics of NESL by Blelloch and Greiner. However, we have noticed their semantics has some inconsistency between the treatments of stack space and heap space. This requires us to revise the semantics so that the semantics treats space in more consistent manner in order to obtain our result. 1
A new criterion for safe program transformations
 In Proceedings of the Forth International Workshop on Higher Order Operational Techniques in Semantics (HOOTS), volume 41(3) of ENTCS
, 2000
"... Previous studies on safety of program transformations with respect to performance considered two criteria: preserving performance within a constant factor and preserving complexity. However, as the requirement of program transformations used in compilers the former seems too restrictive and the latt ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Previous studies on safety of program transformations with respect to performance considered two criteria: preserving performance within a constant factor and preserving complexity. However, as the requirement of program transformations used in compilers the former seems too restrictive and the latter seems too loose. We propose a new safety criterion: a program transformation preserves performance within a factor proportional to the size of a source program. This criterion seems natural since several compilation methods have effects on performance proportional to the size of a program. Based on this criterion we have shown that two semantics formalizing the size of stack space are equivalent. We also discuss the connection between this criterion and the properties of local program transformations rewriting parts of a program. 1
Scheduling Deterministic Parallel Programs
, 2009
"... are those of the author and should not be interpreted as representing the official policies, either expressed or implied, Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reasoning about the ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
are those of the author and should not be interpreted as representing the official policies, either expressed or implied, Deterministic parallel programs yield the same results regardless of how parallel tasks are interleaved or assigned to processors. This drastically simplifies reasoning about the correctness of these programs. However, the performance of parallel programs still depends upon this assignment of tasks, as determined by a part of the language implementation called the scheduling policy. In this thesis, I define a novel cost semantics for a parallel language that enables programmers to reason formally about different scheduling policies. This cost semantics forms a basis for a suite of prototype profiling tools. These tools allow programmers to simulate and visualize program execution under different scheduling policies and understand how the choice of policy affects application memory use. My cost semantics also provides a specification for implementations of the language. As an example of such an implementation, I have extended MLton, a compiler
Speculative multithreading in a Java virtual machine
, 2005
"... Speculative multithreading (SpMT) is a dynamic program parallelisation technique that promises dramatic speedup of irregular, pointerbased programs as well as numerical, loopbased programs. We present the design and implementation of softwareonly SpMT for Java at the virtual machine level. We tak ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Speculative multithreading (SpMT) is a dynamic program parallelisation technique that promises dramatic speedup of irregular, pointerbased programs as well as numerical, loopbased programs. We present the design and implementation of softwareonly SpMT for Java at the virtual machine level. We take the full Java language into account and we are able to run and analyse real world benchmarks in reasonable execution times on commodity multiprocessor hardware. We provide an experimental analysis of benchmark behaviour, uncovered parallelism, the impact of return value prediction, processor scalability, and a breakdown of overhead costs. 1
LowContention DepthFirst Scheduling of Parallel Computations with WriteOnce Synchronization Variables
 In Proc. 13th ACM Symp. on Parallel Algorithms and Architectures (SPAA
, 2001
"... We present an efficient, randomized, online, scheduling algorithm for a large class of programs with writeonce synchronization variables. The algorithm combines the workstealing paradigm with the depthfirst scheduling technique, resulting in high space efficiency and good time complexity. By auto ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present an efficient, randomized, online, scheduling algorithm for a large class of programs with writeonce synchronization variables. The algorithm combines the workstealing paradigm with the depthfirst scheduling technique, resulting in high space efficiency and good time complexity. By automatically increasing the granularity of the work scheduled on each processor, our algorithm achieves good locality, low contention and low scheduling overhead, improving upon a previous depthfirst scheduling algorithm [6] published in SPAA'97. Moreover, it is provably efficient for the general class of multithreaded computations with writeonce synchronization variables (as studied in [6]), improving upon algorithm DFDeques (published in SPAA'99 [24]), which is only for the more restricted class of nested parallel computations. More specifically, consider such a computation with work T1 , depth T1 and oe synchronizations, and suppose that space S1 suffices to execute the computation on a singleprocessor computer. Then, on a Pprocessor sharedmemory parallel machine, the expected space complexity of our algorithm is at most S1 +O(PT1 log(PT1 )), and its expected time complexity is O(T1=P+oe log(PT1)=P+T1 log(PT1 )). Moreover, for any ffl ? 0, the space complexity of our algorithm is S1 + O(P (T1 + ln(1=ffl)) log(P (T1 + ln(P (T1 + ln(1=ffl))=ffl)))) with probability at least 1 \Gamma ffl. Thus, even for values of ffl as small as e \GammaT 1 , the space complexity of our algorithm is at most S1 +O(PT1 log(PT1 )) with probability at least 1 \Gamma e \GammaT 1 . These bounds include all time and space costs for both the computation and the scheduler. 1