Results 1 - 10
of
28
Algorithm + Strategy = Parallelism
- JOURNAL OF FUNCTIONAL PROGRAMMING
, 1998
"... The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of ..."
Abstract
-
Cited by 51 (18 self)
- Add to MetaCart
The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of non-strict functional languages. Using evaluation strategies, it is possible to achieve a clean separation between algorithmic and behavioural code. The result is enhanced clarity and shorter parallel programs. Evaluation strategies are a very general concept: this paper shows how they can be used to model a wide range of commonly used programming paradigms, including divideand -conquer, pipeline parallelism, producer/consumer parallelism, and data-oriented parallelism. Because they are based on unrestricted higher-order functions, they can also capture irregular parallel structures. Evaluation strategies are not just of theoretical interest: they have evolved out of our experience in parallelising several large-scale applications, where they have proved invaluable in helping to manage the complexities of parallel behaviour. These applications are described in detail here. The largest application we have studied to date, Lolita, is a 60,000 line natural language parser. Initial results show that for these applications we can achieve acceptable parallel performance, while incurring minimal overhead for using evaluation strategies.
A compiler controlled Threaded Abstract Machine
- Journal of Parallel and Distributed Computing
, 1993
"... ..."
Abstract interpretation based formal methods and future challenges, invited paper
- Informatics — 10 Years Back, 10 Years Ahead, volume 2000 of Lecture Notes in Computer Science
, 2001
"... Abstract. In order to contribute to the solution of the software reliability problem, tools have been designed to analyze statically the run-time behavior of programs. Because the correctness problem is undecidable, some form of approximation is needed. The purpose of abstract interpretation is to f ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
Abstract. In order to contribute to the solution of the software reliability problem, tools have been designed to analyze statically the run-time behavior of programs. Because the correctness problem is undecidable, some form of approximation is needed. The purpose of abstract interpretation is to formalize this idea of approximation. We illustrate informally the application of abstraction to the semantics of programming languages as well as to static program analysis. The main point is that in order to reason or compute about a complex system, some information must be lost, that is the observation of executions must be either partial or at a high level of abstraction. In the second part of the paper, we compare static program analysis with deductive methods, model-checking and type inference. Their foundational ideas are briefly reviewed, and the shortcomings of these four methods are discussed, including when they should be combined. Alternatively, since program debugging is still the main program verification
Separation Constraint Partitioning - A New Algorithm for Partitioning Non-strict Programs into Sequential Threads
- In Conference Record of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
, 1995
"... In this paper we present substantially improved thread partitioning algorithms for modern implicitly parallel languages. We present a new block partitioning algorithm, separation constraint partitioning, which is both more powerful and more flexible than previous algorithms. Our algorithm is guarant ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
In this paper we present substantially improved thread partitioning algorithms for modern implicitly parallel languages. We present a new block partitioning algorithm, separation constraint partitioning, which is both more powerful and more flexible than previous algorithms. Our algorithm is guaranteed to derive maximal threads. We present a theoretical framework for proving the correctness of our partitioning approach, and we show how separation constraint partitioning makes interprocedural partitioning viable. We have implemented the partitioning algorithms in an Id90 compiler for workstations and parallel machines. Using this experimental platform, we quantify the effectiveness of different partitioning schemes on whole applications. 1 Introduction Modern implicitly parallel languages, such as the functional language Id90, allow the elegant formulation of a broad class of problems while exposing substantial parallelism. However, their non-strict semantics require fine-grain dynami...
MIMD-Style Parallel Programming Based on Continuation-Passing Threads
- MASSACHUSETTS INSTITUTE OF TECHNOLOGY, LABORATORY FOR COMPUTER SCIENCE
, 1994
"... Today's message passing architectures are characterized by high communication costs and they typically lack hardware support for synchronization and scheduling. These deficiencies present a severe obstacle to obtaining efficient implementations of parallel applications whose communication patterns a ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Today's message passing architectures are characterized by high communication costs and they typically lack hardware support for synchronization and scheduling. These deficiencies present a severe obstacle to obtaining efficient implementations of parallel applications whose communication patterns are either highly irregular or dependent on dynamic information. In this paper we present a model based on continuation-passing threads in which we try to overcome these difficulties. The model incorporates two effective software mechanisms targeted towards lengthening sequential threads in order to offset the costs of dynamic scheduling, and towards preserving the locality of computations to reduce the network traffic. The model is currently implemented as a C language extension along with a runtime system implemented on the CM-5 that embodies a work stealing scheduler. Real world applications written in this package, such as ray-tracing and protein folding, have shown impressive speedup res...
Generation and Quantitative Evaluation of Dataflow Clusters
, 1993
"... Multithreaded or hybrid von Neumann/dataflow execution models have an advantage over the fine-grain dataflow model in that they significantly reduce the run time overhead incurred by matching. In this paper, we look at two issues related to the evaluation of a coarse-grain dataflow model of executio ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
Multithreaded or hybrid von Neumann/dataflow execution models have an advantage over the fine-grain dataflow model in that they significantly reduce the run time overhead incurred by matching. In this paper, we look at two issues related to the evaluation of a coarse-grain dataflow model of execution. The first issue concerns the compilation into a coarsegrain code from a fine-grain one. In this study, the concept of coarse-grain code is captured by clusters which can be thought of as mini-dataflow graphs which execute strictly, deterministically and without blocking. We look at two bottom-up algorithms: the basic block and the dependence sets methods, to partition dataflow graphs into clusters. The second issue is the actual performance of the clusterbased execution as several architecture parameters are varied (e.g. number of processors, matching cost, network latency, etc.). From the extensive simulation data we evaluate (1) the potential speedup over the fine-grain execution and (2...
Code Generations, Evaluations, and Optimizations in Multithreaded Executions
, 1995
"... OF DISSERTATION CODE GENERATIONS, EVALUATIONS, AND OPTIMIZATIONS IN MULTITHREADED EXECUTIONS Efficient large-scale parallel processing can result only from proper handling of latency. Latency arises either from remote memory accesses or synchronizations. Multithreading is an execution model that can ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
OF DISSERTATION CODE GENERATIONS, EVALUATIONS, AND OPTIMIZATIONS IN MULTITHREADED EXECUTIONS Efficient large-scale parallel processing can result only from proper handling of latency. Latency arises either from remote memory accesses or synchronizations. Multithreading is an execution model that can effectively deal with latency by switching among a set of ready threads. This model has been proposed in a variety of forms: a unit of storage can be based on either a collection of threads or a single thread, threads can be either blocking or non-blocking, and synchronization can be either implicit or explicit. This dissertation describes research in the evaluation and optimization of various issues in multithreading. Issues of particular interest are the development of a multithreaded execution model to be used as a test-bed and a hybrid code generation scheme where threads are generated in a top-down manner and then optimized in a bottom-up fashion. Various forms of locality are also ide...
Thread Partitioning and Scheduling Based on Cost Model
, 1997
"... There has been considerable interest in implementing a multithreaded program execution and architecture model on a multiprocessor whose primary processors consist of today's off-the-shelf microprocessors. Unlike some custom-designed multithreaded processor architectures, which can interleave mult ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
There has been considerable interest in implementing a multithreaded program execution and architecture model on a multiprocessor whose primary processors consist of today's off-the-shelf microprocessors. Unlike some custom-designed multithreaded processor architectures, which can interleave multiple threads concurrently, conventional processors can only execute one thread at a time. This presents a unique and challenging problem to the compiler: partition a program into threads so that it executes both correctly and in minimal time. We present a new heuristic algorithm based on an interesting extension of the classical list scheduling algorithm. Based on a cost model, our algorithm groups instructions into threads by considering the trade-offs among parallelism, latency tolerance, thread switching costs and sequential execution efficiency. The proposed algorithm has been implemented, and its performance measured through experiments on a variety of architecture parameters a...
Compilation Techniques for Parallel Systems
- PARALLEL COMPUTING
, 1999
"... Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and effectively exploiting parallelism ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Over the past two decades tremendous progress has been made in both the design of parallel architectures and the compilers needed for exploiting parallelism on such architectures. In this paper we summarize the advances in compilation techniques for uncovering and effectively exploiting parallelism at various levels of granularity. We begin by describing the program analysis techniques through which parallelism is detected and expressed in form of a program representation. Next compilation techniques for scheduling instruction level parallelism are discussed along with the relationship between the nature of compiler support and type of processor architecture. Compilation techniques for exploiting loop and task level parallelism on shared memory multiprocessors are summarized. Locality optimizations that must be used in conjunction with parallelization techniques for achieving high performance on machines with complex memory hierarchies are also discussed. Finally we provide an...
Empirical Study of a Dataflow Language on the CM-5
- Advanced Topics in Dataflow Computing and Multithreading
, 1994
"... This paper presents empirical data on the behavior of large dataflow programs on a distributed memory multiprocessor. The programs, written in the dataflow language Id90, are compiled via a Threaded Abstract Machine (TAM) for the CM-5. TAM refines dataflow execution models by addressing critical con ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper presents empirical data on the behavior of large dataflow programs on a distributed memory multiprocessor. The programs, written in the dataflow language Id90, are compiled via a Threaded Abstract Machine (TAM) for the CM-5. TAM refines dataflow execution models by addressing critical constraints that modern parallel architectures place on the compilation of general-purpose parallel programming languages. It exposes synchronization, scheduling, and network access so that the compiler can optimize against the cost of these operations.

