Results 1  10
of
10
HigherOrder Unification via Combinators
 Theoretical Computer Science
, 1993
"... We present an algorithm for unification in the simply typed lambda calculus which enumerates complete sets of unifiers using a finitely branching search space. In fact, the types of terms may contain typevariables, so that a solution may involve typesubstitution as well as termsubstitution. the ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We present an algorithm for unification in the simply typed lambda calculus which enumerates complete sets of unifiers using a finitely branching search space. In fact, the types of terms may contain typevariables, so that a solution may involve typesubstitution as well as termsubstitution. the problem is first translated into the problem of unification with respect to extensional equality in combinatory logic, and the algorithm is defined in terms of transformations on systems of combinatory terms. These transformations are based on a new method (itself based on systems) for deciding extensional equality between typed combinatory logic terms. 1 Introduction This paper develops a new algorithm for higherorder unification. A higherorder unification problem is specified by two terms F and G of the explicitly simply typed lambda calculus LC; a solution is a substitution oe such that oeF = fij oeG. We will always assume the extensionality axiom j in this paper. In fact we tre...
A Fresh Look at Combinator Graph Reduction (Or, Having a TIGRE by the Tail)
 SIGPLAN Notices
, 1989
"... We present a new abstract machine for graph reduction called TIGRE. Benchmark results show that TIGRE's ex ecution speed compares quite favorably with previous combinator graph reduction techniques on similar hardware. Furthermore, the mapping of TIGRE onto conventional hardware is simple and effi ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We present a new abstract machine for graph reduction called TIGRE. Benchmark results show that TIGRE's ex ecution speed compares quite favorably with previous combinator graph reduction techniques on similar hardware. Furthermore, the mapping of TIGRE onto conventional hardware is simple and efficient. Mainframe implementations of TIGRE provide performance levels exceeding those previously available on custom graph reduction hardware.
The Reduceron Reconfigured
"... The leading implementations of graph reduction all target conventional processors designed for lowlevel imperative execution. In this paper, we present a processor specially designed to perform graphreduction. Our processor – the Reduceron – is implemented using offtheshelf reconfigurable hardwa ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The leading implementations of graph reduction all target conventional processors designed for lowlevel imperative execution. In this paper, we present a processor specially designed to perform graphreduction. Our processor – the Reduceron – is implemented using offtheshelf reconfigurable hardware. We highlight the lowlevel parallelism present in sequential graph reduction, and show how parallel memories and dynamic analyses are used in the Reduceron to achieve an average reduction rate of 0.55 function applications per clockcycle. Categories and Subject Descriptors C.1.3 [Processor Architectures]: Other Architecture Styles—Highlevel language architectures;
Experience with a Clustered Parallel Reduction Machine
, 1993
"... A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and program transformations. It has been successfully applied to a number of algorithms resulting in a b ..."
Abstract
 Add to MetaCart
A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and program transformations. It has been successfully applied to a number of algorithms resulting in a benchmark of small and medium size parallel functional programs. Sophisticated compilation techniques are used such as strictness analysis on nonflat domains and RISC and VLIW code generation. Parallel jobs are distributed by an efficient hierarchical scheduler. A special processor for graph reduction has been designed as a basic building block for the machine. A prototype of a single cluster machine has been constructed with stock hardware. This paper describes the experience with the project and its current state. 1 Introduction Functional programming is founded on the lambda calculus, which is a mathematical theory that provides a sound basis for work on reduction machines [5]. This is p...
Compiling Lazy Functional Languages: An introduction
, 1987
"... Machine (FAM) [Car83] and the Categorical Abstract Machine (CAM) [CCM85] can be considered as variations of the SECD theme. Wadsworth [Wad71] describes an interpreter for the calculus which performs normalorder graph reduction. In graph reduction, the expression being reduced is represented A by ..."
Abstract
 Add to MetaCart
Machine (FAM) [Car83] and the Categorical Abstract Machine (CAM) [CCM85] can be considered as variations of the SECD theme. Wadsworth [Wad71] describes an interpreter for the calculus which performs normalorder graph reduction. In graph reduction, the expression being reduced is represented A by a directed graph (in Wadsworth's reducer it is also acyclic). When a reduction rule is applied, be it fireduction as in this case, or e.g. combinator reduction, the root of the reducible expression is overwritten with the result of the reduction. In Wadsworth's graph reducer, when applying the reduction rule (v:e)e 0 ) e[e 0 =v], a copy of the graph of the body e is created, with pointers to e 0 substituted for free occurrences of vif v occurs twice or more e 0 thus becomes shared. When reducing a shared subgraph, all other uses of this subgraph benefit from the first reduction. Wadsworth coins the term callbyneed for the mechanism whereby an expression is reduced at most o...
Pace: A Prototype Design
"... The PACE architecture exploits the potential for finegrained parallelism in the execution of functional languages. It is an extensible, distributed memory, multiprocessor architecture that is designed specifically to support the graph reduction model of computation. PACE differs from most other res ..."
Abstract
 Add to MetaCart
The PACE architecture exploits the potential for finegrained parallelism in the execution of functional languages. It is an extensible, distributed memory, multiprocessor architecture that is designed specifically to support the graph reduction model of computation. PACE differs from most other research projects in this area in that it advocates the use of a specially designed processor, rather than currently available devices, as the basic replicable node. In this paper we present the design of a prototype version of the new processor, together with some results obtained by simulating the parallel execution of example programs on both a detailed Verilog description of the hardware and a much faster Csimulator. Keywords: parallel SK combinator graph reduction 1 Introduction Parallel processing is currently making great strides forward in applications (typically scientific) where dataparallelism is easily identified. Architectures consisting of collections of conventional processor...
Cache Behaviour of Lazy Functional Programs
, 1992
"... . To deepen our quantitative understanding of the performance of lazy evaluation, we have studied the cache behaviour of a benchmark of functional programs. The compiler, based on the Gmachine style of graph reduction, has been modified to insert monitoring code into the executable that records ins ..."
Abstract
 Add to MetaCart
. To deepen our quantitative understanding of the performance of lazy evaluation, we have studied the cache behaviour of a benchmark of functional programs. The compiler, based on the Gmachine style of graph reduction, has been modified to insert monitoring code into the executable that records instruction and data references at run time. The resulting address trace is used to drive a cache simulator that computes statistics like miss rates and traffic ratios. A number of experiments with different cache parameters (size, associativity, etc.) shows that the benchmark programs have a strong spatial locality in their memory references. This is caused by the heap allocation strategy that allocates nodes by advancing a pointer through the heap, generating new addresses. Therefore the initialisation of new heap nodes results in cache misses, which dominate performance. Comparisons with results of other functional language implementations confirm this behaviour 1 Introduction Recently, com...
PACE: A Multiprocessor Architecture Dedicated to Graph Reduction
, 1995
"... The PACE architecture exploits the potential for finegrained parallelism in graph reduction. It employs a specially designed processor, rather than stock hardware as the basic replicable node. We present an outline design of a prototype processor, together with an empirical evaluation using simulati ..."
Abstract
 Add to MetaCart
The PACE architecture exploits the potential for finegrained parallelism in graph reduction. It employs a specially designed processor, rather than stock hardware as the basic replicable node. We present an outline design of a prototype processor, together with an empirical evaluation using simulation. 1 Introduction Parallel processing has made great strides forward in applications (typically scientific) where dataparallelism is easily identified. Architectures consisting of collections of conventional processors can provide good speedups for such applications. However there is a whole class of applications which are not so well served. This is because performance for these applications is largely determined by how efficiently work allocation, interprocessor communication, distributed garbage collection etc, can be implemented. These tasks have to be performed concurrently by the same computing resource, namely the basic replicable node. This implies that each node has to be an eff...
LambdaCalculus and Functional Programming
"... This paper deals with the problem of a program that is essentially the same over any of several types but which, in the older imperative languages must be rewritten for each separate type. For example, a sort routine may be written with essentially the same code except for the types for integers, bo ..."
Abstract
 Add to MetaCart
This paper deals with the problem of a program that is essentially the same over any of several types but which, in the older imperative languages must be rewritten for each separate type. For example, a sort routine may be written with essentially the same code except for the types for integers, booleans, and strings. It is clearly desirable to have a method of writing a piece of code that can accept the specific type as an argument. Milner developed his ideas in terms of type assignment to lambdaterms. It is based on a result due originally to Curry (Curry 1969) and Hindley (Hindley 1969) known as the principal typescheme theorem, which says that (assuming that the typing assumptions are sufficiently wellbehaved) every term has a principal typescheme, which is a typescheme such that every other typescheme which can be proved for the given term is obtained by a substitution of types for type variables. This use of type schemes allows a kind of generality over all types, which is known as polymorphism.
The Reduceron reconfigured and reevaluated
"... A new version of a specialpurpose processor for running lazy functional programs is presented. This processor – the Reduceron – exploits parallel memories and dynamic analyses to increase evaluation speed, and is implemented using reconfigurable hardware. Compared to a more conventional functional ..."
Abstract
 Add to MetaCart
A new version of a specialpurpose processor for running lazy functional programs is presented. This processor – the Reduceron – exploits parallel memories and dynamic analyses to increase evaluation speed, and is implemented using reconfigurable hardware. Compared to a more conventional functional language implementation targeting a standard RISC processor running on the same reconfigurable hardware, the Reduceron offers a significant improvement in runtime performance. 1