Results 1 - 10
of
31
Once Upon a Type
- In Functional Programming Languages and Computer Architecture
, 1995
"... A number of useful optimisations are enabled if we can determine when a value is accessed at most once. We extend the Hindley-Milner type system with uses, yielding a typeinference based program analysis which determines when values are accessed at most once. Our analysis can handle higher-order fun ..."
Abstract
-
Cited by 77 (2 self)
- Add to MetaCart
A number of useful optimisations are enabled if we can determine when a value is accessed at most once. We extend the Hindley-Milner type system with uses, yielding a typeinference based program analysis which determines when values are accessed at most once. Our analysis can handle higher-order functions and data structures, and admits principal types for terms. Unlike previous analyses, we prove our analysis sound with respect to call-by-need reduction. Call-by-name reduction does not provide an accurate model of how often a value is used during lazy evaluation, since it duplicates work which would actually be shared in a real implementation. Our type system can easily be modified to analyse usage in a call-by-value language. 1 Introduction This paper describes a method for determining when a value is used at most once. Our method is based on a simple modification of the Hindley-Milner type system. Each type is labelled to indicate whether the corresponding value is used at most onc...
Deriving a Lazy Abstract Machine
- Journal of Functional Programming
, 1997
"... Machine Peter Sestoft Department of Mathematics and Physics Royal Veterinary and Agricultural University Thorvaldsensvej 40, DK-1871 Frederiksberg C, Denmark E-mail: sestoft@dina.kvl.dk Version 6 of March 13, 1996 Abstract We derive a simple abstract machine for lazy evaluation of the lambda cal ..."
Abstract
-
Cited by 57 (2 self)
- Add to MetaCart
Machine Peter Sestoft Department of Mathematics and Physics Royal Veterinary and Agricultural University Thorvaldsensvej 40, DK-1871 Frederiksberg C, Denmark E-mail: sestoft@dina.kvl.dk Version 6 of March 13, 1996 Abstract We derive a simple abstract machine for lazy evaluation of the lambda calculus, starting from Launchbury's natural semantics. Lazy evaluation here means non-strict evaluation with sharing of argument evaluation, that is, call-by-need. The machine we derive is a lazy version of Krivine's abstract machine, which was originally designed for call-by-name evaluation. We extend it with datatype constructors and base values, so the final machine implements all dynamic aspects of a lazy functional language. 1 Introduction The development of an efficient abstract machine for lazy evaluation usually starts from either a graph reduction machine or an environment machine. Graph reduction machines perform substitution by rewriting the term graph, that is, the program itself...
Compiling Haskell by program transformation: a report from the trenches
- In Proc. European Symp. on Programming
, 1996
"... Many compilers do some of their work by means of correctness-preserving, and hopefully performance-improving, program transformations. The Glasgow Haskell Compiler (GHC) takes this idea of "compilation by transformation" as its war-cry, trying to express as much as possible of the compilation proces ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
Many compilers do some of their work by means of correctness-preserving, and hopefully performance-improving, program transformations. The Glasgow Haskell Compiler (GHC) takes this idea of "compilation by transformation" as its war-cry, trying to express as much as possible of the compilation process in the form of program transformations. This paper reports on our practical experience of the transformational approach to compilation, in the context of a substantial compiler. The paper appears in the Proceedings of the European Symposium on Programming, Linkoping, April 1996. 1 Introduction Using correctness-preserving transformations as a compiler optimisation is a well-established technique (Aho, Sethi & Ullman [1986]; Bacon, Graham & Sharp [1994]). In the functional programming area especially, the idea of compilation by transformation has received quite a bit of attention (Appel [1992]; Fradet & Metayer [1991]; Kelsey [1989]; Kelsey & Hudak [1989]; Kranz [1988]; Steele [1978]). A ...
Algorithm + Strategy = Parallelism
- JOURNAL OF FUNCTIONAL PROGRAMMING
, 1998
"... The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of ..."
Abstract
-
Cited by 51 (18 self)
- Add to MetaCart
The process of writing large parallel programs is complicated by the need to specify both the parallel behaviour of the program and the algorithm that is to be used to compute its result. This paper introduces evaluation strategies, lazy higher-order functions that control the parallel evaluation of non-strict functional languages. Using evaluation strategies, it is possible to achieve a clean separation between algorithmic and behavioural code. The result is enhanced clarity and shorter parallel programs. Evaluation strategies are a very general concept: this paper shows how they can be used to model a wide range of commonly used programming paradigms, including divideand -conquer, pipeline parallelism, producer/consumer parallelism, and data-oriented parallelism. Because they are based on unrestricted higher-order functions, they can also capture irregular parallel structures. Evaluation strategies are not just of theoretical interest: they have evolved out of our experience in parallelising several large-scale applications, where they have proved invaluable in helping to manage the complexities of parallel behaviour. These applications are described in detail here. The largest application we have studied to date, Lolita, is a 60,000 line natural language parser. Initial results show that for these applications we can achieve acceptable parallel performance, while incurring minimal overhead for using evaluation strategies.
The runtime structure of object ownership
- In ECOOP
, 2006
"... Abstract. Object-oriented programs often require large heaps to run properly or meet performance goals. They use high-overhead collections, bulky data models, and large caches. Discovering this is quite challenging. Manual browsing and flat summaries do not scale to complex graphs with 20 million ob ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Abstract. Object-oriented programs often require large heaps to run properly or meet performance goals. They use high-overhead collections, bulky data models, and large caches. Discovering this is quite challenging. Manual browsing and flat summaries do not scale to complex graphs with 20 million objects. Context is crucial to understanding responsibility and inefficient object connectivity. We summarize memory footprint with help from the dominator relation. Each dominator tree captures unique ownership. Edges between trees capture responsibility. We introduce a set of ownership structures, and quantify their abundance. We aggregate these structures, and use thresholds to identify important aggregates. We introduce the ownership graph to summarize responsibility, and backbone equivalence to aggregate patterns within trees. Our implementation quickly generates concise summaries. In two minutes, it generates a 14-node ownership graph from 29 million objects. Backbone equivalence identifies a handful of patterns that account for 80 % of a tree’s footprint. 1
Lag, Drag, Void and Use - Heap Profiling and Space-Efficient Compilation Revisited
- In Proc. Intl. Conf. on Functional Programming
, 1996
"... The context for this paper is functional computation by graph reduction. Our overall aim is more efficient use of memory. The specific topic is the detection of dormant cells in the live graph --- those retained in heap memory though not actually playing a useful role in computation. We describe a p ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
The context for this paper is functional computation by graph reduction. Our overall aim is more efficient use of memory. The specific topic is the detection of dormant cells in the live graph --- those retained in heap memory though not actually playing a useful role in computation. We describe a profiler that can identify heap consumption by such `useless' cells. Unlike heap profilers based on traversals of the live heap, this profiler works by examining cells postmortem. The new profiler has revealed a surprisingly large proportion of `useless' cells, even in some programs that previously seemed space-efficient such as the boot-strapping Haskell compiler nhc. 1 Introduction A typical computation by graph reduction involves a large and changing population of heap-memory cells. Taking a census of this population at regular intervals can be very instructive, both for functional programmers and for functional-language implementors. A heap profiler [RW93] records population counts for ...
Profiling Large-Scale Lazy Functional Programs
- JOURNAL OF FUNCTIONAL PROGRAMMING
, 1998
"... The LOLITA natural language processor is an example of one of the ever-increasing number of large-scale systems written entirely in a functional programming language. The system consists of over 47,000 lines of Haskell code (excluding comments) and is able to perform a wide range of tasks such as se ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
The LOLITA natural language processor is an example of one of the ever-increasing number of large-scale systems written entirely in a functional programming language. The system consists of over 47,000 lines of Haskell code (excluding comments) and is able to perform a wide range of tasks such as semantic and pragmatic analysis of text, information extraction and query analysis. The efficiency of such a system is critical; interactive tasks (such as query analysis) must ensure that the user is not inconvenienced by long pauses, and batch mode tasks (such as information extraction) must ensure that an adequate throughput can be achieved. For the past three years the profiling tools supplied with GHC and HBC have been used to analyse and reason about the complexity of the LOLITA system. There have been good results, however experience has shown that in a large system the profiling life-cycle is often too long to make detailed analysis possible, and the results are often misleading. In response to these problems a profiler has been developed which allows the complete set of program costs to be recorded in so-called cost-centre stacks. These program costs are then analysed using a post-processing tool to allow the developer to explore the costs of the program in ways that are either not possible with existing tools or would require repeated compilations and executions of the program. The modifications to the Glasgow Haskell compiler based on detailed cost semantics and an efficient implementation scheme are discussed. The results of using this new profiling tool in the analysis of a number of Haskell programs are also presented. The overheads of the scheme are discussed and the benefits of this new system are considered. An outline is also given of how this approach can be modified to assist with the tracing and debugging of programs.
Comprehensive Profiling Support in the Java Virtual Machine
, 1999
"... Existing profilers for Java applications typically rely on custom instrumentation in the Java virtual machine, and measure only limited types of resource consumption. Garbage collection and multi-threading pose additional challenges to profiler design and implementation. In this paper we discuss a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Existing profilers for Java applications typically rely on custom instrumentation in the Java virtual machine, and measure only limited types of resource consumption. Garbage collection and multi-threading pose additional challenges to profiler design and implementation. In this paper we discuss a general-purpose, portable, and extensible approach for obtaining comprehensive profiling information from the Java virtual machine. Profilers based on this framework can uncover CPU usage hot spots, heavy memory allocation sites, unnecessary object retention, contended monitors, and thread deadlocks. In addition, we discuss a novel algorithm for thread-aware statistical CPU time profiling, a heap profiling technique independent of the garbage collection implementation, and support for interactive profiling with minimum overhead.
Space Profiling for Parallel Functional Programs
"... This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provi ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper presents a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provides a means to reason about performance without requiring a detailed understanding of the compiler or runtime system. It also provides a specification for language implementers. This is critical in that it enables us to separate cleanly the performance of the application from that of the language implementation. Some aspects of the implementation can have significant effects on performance. Our cost semantics enables programmers to understand the impact of different scheduling policies yet abstracts away from many of the details of their implementations. We show applications where the choice of scheduling policy has asymptotic effects on space use. We explain these use patterns through a demonstration of our tools. We also validate our methodology by observing similar performance in our implementation of a parallel extension of Standard ML.

