Results 1  10
of
10
A provable time and space efficient implementation of nesl
 In International Conference on Functional Programming
, 1996
"... In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed Jcalculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementa ..."
Abstract

Cited by 70 (7 self)
 Add to MetaCart
In this paper we prove time and space bounds for the implementation of the programming language NESL on various parallel machine models. NESL is a sugared typed Jcalculus with a set of array primitives and an explicit parallel map over arrays. Our results extend previous work on provable implementation bounds for functional languages by considering space and by including arrays. For modeling the cost of NESL we augment a standard callbyvalue operational semantics to return two cost measures: a DAG representing the sequential dependence in the computation, and a measure of the space taken by a sequential implementation. We show that a NESL program with w work (nodes in the DAG), d depth (levels in the DAG), and s sequential space can be implemented on a p processor butterfly network, hypercube, or CRCW PRAM usin O(w/p + d log p) time and 0(s + dp logp) reachable space. For programs with sufficient parallelism these bounds are optimal in that they give linew speedup and use space within a constant factor of the sequential space. 1
Algorithm + Strategy = Parallelism
 JOURNAL OF FUNCTIONAL PROGRAMMING
, 1998
"... The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higherorder functions that control the parallel evaluation of ..."
Abstract

Cited by 56 (20 self)
 Add to MetaCart
The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higherorder functions that control the parallel evaluation of nonstrict functional languages. Using evaluation strategies, it is possible to achieve a clean separation between algorithmic and behavioural code. The result is enhanced clarity and shorter parallel programs. Evaluation strategies are a very general concept: this paper shows how they can be used to model a wide range of commonly used programming paradigms, including divideand conquer, pipeline parallelism, producer/consumer parallelism, and dataoriented parallelism. Because they are based on unrestricted higherorder functions, they can also capture irregular parallel structures. Evaluation strategies are not just of theoretical interest: they have evolved out of our experience in parallelising several largescale applications, where they have proved invaluable in helping to manage the complexities of parallel behaviour. These applications are described in detail here. The largest application we have studied to date, Lolita, is a 60,000 line natural language parser. Initial results show that for these applications we can achieve acceptable parallel performance, while incurring minimal overhead for using evaluation strategies.
Parallel Programming using Functional Languages
, 1991
"... I am greatly indebted to Simon Peyton Jones, my supervisor, for his encouragement and technical assistance. His overwhelming enthusiasm was of great support to me. I particularly want to thank Simon and Geoff Burn for commenting on earlier drafts of this thesis. Through his excellent lecturing Cohn ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
I am greatly indebted to Simon Peyton Jones, my supervisor, for his encouragement and technical assistance. His overwhelming enthusiasm was of great support to me. I particularly want to thank Simon and Geoff Burn for commenting on earlier drafts of this thesis. Through his excellent lecturing Cohn Runciman initiated my interest in functional programming. I am grateful to Phil Trinder for his simulator, on which mine is based, and Will Partain for his help with LaTex and graphs. I would like to thank the Science and Engineering Research Council of Great Britain for their financial support. Finally, I would like to thank Michelle, whose culinary skills supported me whilst I was writingup.The Imagination the only nation worth defending a nation without alienation a nation whose flag is invisible and whose borders are forever beyond the horizon a nation whose motto is why have one or the other when you can have one the other and both
A Provably TimeEfficient Parallel Implementation of Full Speculation
 In Proceedings of the 23rd ACM Symposium on Principles of Programming Languages
, 1996
"... Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of queues of suspended threads. We give a provably efficient parallel implementation ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of queues of suspended threads. We give a provably efficient parallel implementation of a speculative functional language on various machine models. The implementation includes proper parallelization of the necessary queuing operations on suspended threads. Our target machine models are a butterfly network, hypercube, and PRAM. To prove the efficiency of our implementation, we provide a cost model using a profiling semantics and relate the cost model to implementations on the parallel machine models. 1 Introduction Futures, lenient languages, and several implementations of graph reduction for lazy languages all use speculative evaluation (callbyspeculation [15]) to expose parallelism. The basic idea of speculative evaluation, in this context, is that the evaluation of a...
Calculating Lenient Programs' Performance
 Functional Programming, Glasgow 1990, Workshops in computing
, 1990
"... Lenient languages, such as Id Nouveau, have been proposed for programming parallel computers. These languages represent a compromise between strict and lazy languages. The operation of parallel languages is very complex; therefore a formal method for reasoning about their performance is desirable. T ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Lenient languages, such as Id Nouveau, have been proposed for programming parallel computers. These languages represent a compromise between strict and lazy languages. The operation of parallel languages is very complex; therefore a formal method for reasoning about their performance is desirable. This paper presents a nonstandard denotational semantics for calculating the performance of lenient programs. The semantics is novel in its use of time and timestamps.
A Parallel Complexity Model for Functional Languages
 IN: PROC. ACM CONF. ON FUNCTIONAL PROGRAMMING LANGUAGES AND COMPUTER ARCHITECTURE
, 1994
"... A complexity model based on the calculus with an appropriate operational semantics in presented and related to various parallel machine models, including the PRAM and hypercube models. The model is used to study parallel algorithms in the context of "sequential" functional languages, and to relate ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
A complexity model based on the calculus with an appropriate operational semantics in presented and related to various parallel machine models, including the PRAM and hypercube models. The model is used to study parallel algorithms in the context of "sequential" functional languages, and to relate these results to algorithms designed directly for parallel machine models. For example, the paper shows that equally good upper bounds can be achieved for merging two sorted sequences in the pure calculus with some arithmetic constants as in the EREW PRAM, when they are both mapped onto a more realistic machine such as a hypercube or butterfly network. In particular for n keys and p processors, they both result in an O(n=p + log 2 p) time algorithm. These results argue that it is possible to get good parallelism in functional languages without adding explicitly parallel constructs. In fact, the lack of random access seems to be a bigger problem than the lack of parallelism. This research...
Programming Language Expressiveness and Circuit Complexity
 in: Internat. Conf. on the Mathematical Foundations of Programming Semantics
, 1996
"... This paper is a continuation of the work begun in [5] on establishing relative intensional expressiveness results for programming languages. Language L 1 is intensionally more expressive than L 2 , if L 1 can compute all the functions L 2 can, with at least the same asymptotic complexity. The questi ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper is a continuation of the work begun in [5] on establishing relative intensional expressiveness results for programming languages. Language L 1 is intensionally more expressive than L 2 , if L 1 can compute all the functions L 2 can, with at least the same asymptotic complexity. The question we address is: Does nondeterministic parallelism add intensional expressiveness? We compare deterministic and nondeterministic extensions of PCF, a simple functional programming language. We develop further the circuit semantics from our earlier work, and establish a connection between parallel PCF programs and boolean circuits. Using results from circuit complexity, and assuming hardware which can detect undefined inputs, we show that nondeterministic parallelism is indeed intensionally more expressive. More precisely, we show that nondeterministic parallelism can lead to exponentially faster programs, and also programs that do exponentially less work. 1 Introduction We conduct an inves...
How Much Nonstrictness do Lenient Programs Require?
 In Conf. on Func. Prog. Languages and Computer Architecture
, 1995
"... Lenient languages, such as Id90, have been touted as among the best functional languages for massively parallel machines [AHN88]. Lenient evaluation combines nonstrict semantics with eager evaluation [Tra91]. Nonstrictness gives these languages more expressive power than strict semantics, while ea ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Lenient languages, such as Id90, have been touted as among the best functional languages for massively parallel machines [AHN88]. Lenient evaluation combines nonstrict semantics with eager evaluation [Tra91]. Nonstrictness gives these languages more expressive power than strict semantics, while eager evaluation ensures the highest degree of parallelism. Unfortunately, nonstrictness incurs a large overhead, as it requires dynamic scheduling and synchronization. As a result, many powerful program analysis techniques have been developed to statically determine when nonstrictness is not required [CPJ85, Tra91, Sch94]. This paper studies a large set of lenient programs and quantifies the degree of nonstrictness they require. We identify several forms of nonstrictness, including functional, conditional, and data structure nonstrictness. Surprisingly, most Id90 programs require neither functional nor conditional nonstrictness. Many benchmark programs, however, make use of a limited fo...
Formal Methods for Intellectual Property Composition Across Synchronization Domains
, 2007
"... A significant part of the SystemonaChip (SoC) design problem is in the correct composition of intellectual property (IP) blocks. Ever increasing clock frequencies make it impossible for signals to reach from one end of the chip to the other end within a clock cycle; this invalidates the socalled ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A significant part of the SystemonaChip (SoC) design problem is in the correct composition of intellectual property (IP) blocks. Ever increasing clock frequencies make it impossible for signals to reach from one end of the chip to the other end within a clock cycle; this invalidates the socalled synchrony assumption, where the timing of computation and communication are assumed to be negligible, and happen within a clock cycle. Missing the timing deadline causes this violation, and may have ramifications on the overall system reliability. Although latency insensitive protocols (LIPs) have been proposed as a solution to the problem of signal propagation over long interconnects, they have their own limitations. A more generic solution comes in the form of globally asynchronous locally synchronous (GALS) designs. However, composing synchronous IP blocks either over long multicycle delay interconnects or over asynchronous communication links for a GALS design is a challenging task, especially for ensuring the functional correctness of the overall design. In this thesis, we analyze various solutions for solving the synchronization problems related with IP composition. We present alternative LIPs, and provide a validation framework for ensuring their correctness. Our notion of correctness is that of latency equivalence between a latency insensitive design and its synchronous counterpart. We propose a tracebased framework
Thesis Proposal: Scheduling Parallel Functional Programs
, 2007
"... Parallelism abounds! To continue to improve performance, programmers must use parallel algorithms to take advantage of multicore and other parallel architectures. Existing declarative languages allow programmers to express these parallel algorithms concisely. With a deterministic semantics, a decla ..."
Abstract
 Add to MetaCart
Parallelism abounds! To continue to improve performance, programmers must use parallel algorithms to take advantage of multicore and other parallel architectures. Existing declarative languages allow programmers to express these parallel algorithms concisely. With a deterministic semantics, a declarative language also allows programmers to reason about the correctness of programs independently of the language implementation. Despite this, the performance of these programs still relies heavily on the language implementation and especially on the choice of scheduling policy. In this thesis, I propose to use a cost semantics to allow programmers to reason about the performance of parallel programs and in particular about their use of space. This cost semantics also provides a specification for the language implementation. In my previous work, I have formalized several implementations, including different scheduling policies, as smallstep transition semantics. Incorporating these policies into the language semantics establishes a tight link between programs, scheduling policies, and performance. Using these semantics, I have shown that in some cases, the choice of scheduling policy has an asymptotic effect on memory use. In my continuing work, I will consider extensions to my language and develop a fullscale implementation. With these, I hope to demonstrate that a declarative language is a practical way to program parallel algorithms and that my cost semantics offers an effective means to reason about their performance. 1