Results 11 
15 of
15
LowContention DepthFirst Scheduling of Parallel Computations with WriteOnce Synchronization Variables
 In Proc. 13th ACM Symp. on Parallel Algorithms and Architectures (SPAA
, 2001
"... We present an efficient, randomized, online, scheduling algorithm for a large class of programs with writeonce synchronization variables. The algorithm combines the workstealing paradigm with the depthfirst scheduling technique, resulting in high space efficiency and good time complexity. By auto ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present an efficient, randomized, online, scheduling algorithm for a large class of programs with writeonce synchronization variables. The algorithm combines the workstealing paradigm with the depthfirst scheduling technique, resulting in high space efficiency and good time complexity. By automatically increasing the granularity of the work scheduled on each processor, our algorithm achieves good locality, low contention and low scheduling overhead, improving upon a previous depthfirst scheduling algorithm [6] published in SPAA'97. Moreover, it is provably efficient for the general class of multithreaded computations with writeonce synchronization variables (as studied in [6]), improving upon algorithm DFDeques (published in SPAA'99 [24]), which is only for the more restricted class of nested parallel computations. More specifically, consider such a computation with work T1 , depth T1 and oe synchronizations, and suppose that space S1 suffices to execute the computation on a singleprocessor computer. Then, on a Pprocessor sharedmemory parallel machine, the expected space complexity of our algorithm is at most S1 +O(PT1 log(PT1 )), and its expected time complexity is O(T1=P+oe log(PT1)=P+T1 log(PT1 )). Moreover, for any ffl ? 0, the space complexity of our algorithm is S1 + O(P (T1 + ln(1=ffl)) log(P (T1 + ln(P (T1 + ln(1=ffl))=ffl)))) with probability at least 1 \Gamma ffl. Thus, even for values of ffl as small as e \GammaT 1 , the space complexity of our algorithm is at most S1 +O(PT1 log(PT1 )) with probability at least 1 \Gamma e \GammaT 1 . These bounds include all time and space costs for both the computation and the scheduler. 1
Reasoning about Selective Strictness  Operational Equivalence, Heaps and CallbyNeed Evaluation, New Inductive Principles
, 2009
"... Many predominantly lazy languages now incorporate strictness enforcing primitives, for example a strict let or sequential composition seq. Reasons for doing this include gains in time or space efficiencies, or control of parallel evaluation. This thesis studies how to prove equivalences between pro ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Many predominantly lazy languages now incorporate strictness enforcing primitives, for example a strict let or sequential composition seq. Reasons for doing this include gains in time or space efficiencies, or control of parallel evaluation. This thesis studies how to prove equivalences between programs in languages with selective strictness, specifically, we use a restricted core lazy functional language with a selective strictness operator seq whose operational semantics is a variant of one considered by van Eckelen and de Mol, which itself was derived from Launchbury’s natural semantics for lazy evaluation. The main research contributions are as follows: We establish some of the first ever equivalences between programs with selective strictness. We do this by manipulating operational semantics derivations, in
Reasoning about Selective Strictness  Operational Equivalence, Heaps and CallbyNeed Evaluation, New Inductive Principles
, 2009
"... This thesis studies how to prove equivalences between programs in languages with selective strictness, specifically, we use a restricted core lazy functional language with a selective strictness operator seq. We establish some of the first ever equivalences between lazy programs with selective str ..."
Abstract
 Add to MetaCart
This thesis studies how to prove equivalences between programs in languages with selective strictness, specifically, we use a restricted core lazy functional language with a selective strictness operator seq. We establish some of the first ever equivalences between lazy programs with selective strictness by manipulating operational semantics derivations. Our operational semantics is similar to that used by van Eekelen and De Mol, though we introduce a ‘garbagecollecting’ rule for (let) which turns out to cause expressiveness restrictions. For example, arguably reasonable lazy programs such as let y = λz.z in λx.y do not reduce in our operational semantics. We prove some properties of seq, including associativity, idempotence, and leftcommutativity. The proofs use our three notions of program equivalence defined
Cache and I/O Efficient Functional Algorithms
"... The widely studied I/O and idealcache models were developed to account for the large difference in costs to access memory at different levels of the memory hierarchy. Both models are based on a two level memory hierarchy with a fixed size primary memory (cache) of size M, an unbounded secondary mem ..."
Abstract
 Add to MetaCart
The widely studied I/O and idealcache models were developed to account for the large difference in costs to access memory at different levels of the memory hierarchy. Both models are based on a two level memory hierarchy with a fixed size primary memory (cache) of size M, an unbounded secondary memory organized in blocks of size B. The cost measure is based purely on the number of block transfers between the primary and secondary memory. All other operations are free. Many algorithms have been analyzed in these models and indeed these models predict the relative performance of algorithms much more accurately than the standard RAM model. The models, however, require specifying algorithms at a very low level requiring the user to carefully lay out their data in arrays in memory and manage their own memory allocation. In this paper we present a cost model for analyzing the memory efficiency of algorithms expressed in a simple functional language. We show how some algorithms written in standard forms using just lists and trees (no arrays) and requiring no explicit memory layout or memory management are efficient in the model. We then describe an implementation of the language and show provable bounds for mapping the cost in our model to the cost in the idealcache model. These bound imply that purely functional programs based on lists and trees with no special attention to any details of memory layout can be as asymptotically as efficient as the carefully designed imperative I/O efficient algorithms. For example we describe an O ( n B logM/B n) cost sorting algorithm, which is
Abstract Analysis of MethodLevel Speculation
, 2011
"... Threadlevel Speculation (TLS) is a technique for automatic parallelization that has shown excellent results in hardware simulation studies. Existing studies, however, typically require a full stack of analyses, hardware components, and performance assumptions in order to demonstrate and measure spe ..."
Abstract
 Add to MetaCart
Threadlevel Speculation (TLS) is a technique for automatic parallelization that has shown excellent results in hardware simulation studies. Existing studies, however, typically require a full stack of analyses, hardware components, and performance assumptions in order to demonstrate and measure speedup, limiting the ability to vary fundamental choices and making basic design comparisons difficult. Here we approach the problem analytically, abstracting several variations on a general form of TLS (methodlevel speculation) and using our abstraction to model the performance of TLS on common coding idioms. Our investigation is based on exhaustive exploration, and we are able to show how optimal performance is strongly limited by program structure and core choices in speculation design, irrespective of data dependencies. These results provide new, highlevel insight into where and how threadlevel speculation can and should be applied in order to produce practical speedup.