## Implementing Functional Languages with Fast Equality, Sets and Maps: an Exercise in Hash Consing (1994)

Venue: | Bull S.A. Research Center, rue Jean-Jaur`es, 78340 Les Clayes sous Bois |

Citations: | 6 - 3 self |

### BibTeX

@TECHREPORT{Goubault94implementingfunctional,

author = {Jean Goubault},

title = {Implementing Functional Languages with Fast Equality, Sets and Maps: an Exercise in Hash Consing},

institution = {Bull S.A. Research Center, rue Jean-Jaur`es, 78340 Les Clayes sous Bois},

year = {1994}

}

### Years of Citing Articles

### OpenURL

### Abstract

We investigate hash consing, a memory allocation strategy for functional languages. Though the idea is not new, its systematic use as a foundation for the run-time system of a language is new. We call this systematic approach maximal sharing. This strategy is shown to be implementable in practice with small speed and space penalties, while offering great opportunities to save space and execution time in big projects. Besides, it paves the way towards more efficient implementations of very desirable data structures like sets and maps (set-theoretic functions of finite domain) [23], opening the door to a whole slew of set- and map-based functional languages like POPS-Lisp [18] and HimML [20], a variant of Standard ML written by the author. The average-case complexities of operations on sets and maps are investigated, and shown to be quite good indeed. Computation sharing and incremental computation are briefly considered in this framework. Garbage collection techniques are reviewed to d...

### Citations

3153 | Graph-based algorithms for Boolean function manipulation
- Bryant
- 1986
(Show Context)
Citation Context ...cted acyclic graph (DAG). This is actually the way minimal automata are built, and is the key to the extremely compact representation of propositional formulas called binary decision diagrams or BDDs =-=[8]-=-. In the latter case, evidence has it that the practical space usage is polynomial (and mostly linear) in the number of propositionnal variables. 2.1.2 Speed Space usage is not the only place where sh... |

1626 | The Definition of Standard ML
- Milner, Tofte, et al.
- 1990
(Show Context)
Citation Context ...e shared ("monocopy") lists [16], with conversions between them (making a fresh copy in one direction, copying with sharing in the other direction). The semantics of more recent languages li=-=ke ML [22][32]-=- do not allow for physical modifications of any kind of structures, but only of special ref pointers. Systematic hash consing fits the semantics nicely. In chapter 2, a simple functional language is p... |

977 | A machine-oriented logic based on the resolution principle - Robinson - 1965 |

773 |
Structure and Interpretation of Computer Programs
- Abelson, Sussman, et al.
- 1985
(Show Context)
Citation Context ...ons. Finally, we use our fast map algorithms to implement sharing of computations, not data. The problem here is to avoid computing twice the same result. The cure is a technique known as memoization =-=[1]-=-, which is implemented nicely with maps. Another, subtler problem is incremental computation, that is, how to compute the result of a function on some data that is similar but not identical to some pr... |

674 |
Systematic Software Development Using VDM
- Jones
- 1990
(Show Context)
Citation Context ...ce and execution time in big projects. Besides, it paves the way towards more efficient implementations of very desirable data structures like sets and maps (set-theoretic functions of finite domain) =-=[23]-=-, opening the door to a whole slew of set- and map-based functional languages like POPS-Lisp [18] and HimML [20], a variant of Standard ML written by the author. The average-case complexities of opera... |

634 |
Compiling with Continuations
- Appel
- 1992
(Show Context)
Citation Context ...e rare cells in older spaces that point to cells in the newest space. This list can be maintained at reference modification time, since only references may point to objects newer than themselves (see =-=[4]-=-). ffl sweep through the newest space, to free most of the unused cells. Other, older spaces are not touched. 12 When the newest space S is still filled up after GC (say, to 85% of its capacity; this ... |

326 |
Symbolic Model Checking: 10 States and Beyond
- Burch, Clarke, et al.
- 1992
(Show Context)
Citation Context ...he original formula). BDDs have been used to deal with combinatorial problems that seemed intractable before them, for example proving the correctness of a sequential machine with up to 10 130 states =-=[9]-=- (actually, the efficiency of BDD-based algorithms does not depend on the number of states of the machine as much as on the nature of the machine). In general, we infer that, as most data structures i... |

259 |
Sorting and searching, volume 3 of The Art of Computer Programming
- Knuth
- 1973
(Show Context)
Citation Context ... function of two integers. Experience has shown that a dumb hash function as (((unsigned long)car)+((unsigned long)cdr)) % ALLOC HASH SIZE (in C) gives good results, providedsALLOC HASH SIZE is prime =-=[25]-=-. ffl Memory usage: whenever we allocate a new cons cell x::y (that is, one that has never been allocated) , we need to fetch a new cons cell to store x and y, and another one to insert this cons cell... |

218 | List processing in real time on a serial computer
- Baker
- 1978
(Show Context)
Citation Context ...stock hardware, some additional instructions are needed to test for a forwarding pointer, and find the real object if it has been forwarded. This slows down the whole program a bit, but not that much =-=[5]-=-. Moreover, if we don't insist on having incremental garbage collection, forwarding pointers will live only at garbage collection time, and the problem disappears. There is also a solution to the firs... |

218 |
On-the-fly garbage collection: an exercise in cooperation
- Dijkstra, Lamport, et al.
- 1978
(Show Context)
Citation Context ...t on other algorithms. An incremental version of mark-and-sweep has been realized in Meta-VLisp [39], which is simpler than the other mark-and-sweep incremental garbage collectors known to the author =-=[11]-=-[6] (though [11] is too slow in practice [12]); an algorithm based on a clever coding of reference counters, using the fact that most used cells are referenced exactly once, is presented in [10]. We h... |

163 |
A non-strict functional language with polymorphic types. LNCS 201
- Miranda
- 1985
(Show Context)
Citation Context ... to the map fx 1 7! UNIT; : : : ; xn 7! UNITg, where UNIT is a special object of the language. Sets have already been considered of a fundamental importance in languages like SETL [41] or MIRANDA [44]=-=[45]-=-. However, we are convinced that it is the more general notion of maps that is crucial. This is vindicated by personal experience, and also justified by a study [30], where it is shown that most data ... |

161 | The Icon Programming Language - Griswold, Griswold - 1983 |

156 |
Programming With Sets: An Introduction to SETL
- Schwartz, Dewar, et al.
- 1986
(Show Context)
Citation Context ...t fx 1 ; : : : ; xng to the map fx 1 7! UNIT; : : : ; xn 7! UNITg, where UNIT is a special object of the language. Sets have already been considered of a fundamental importance in languages like SETL =-=[41]-=- or MIRANDA [44][45]. However, we are convinced that it is the more general notion of maps that is crucial. This is vindicated by personal experience, and also justified by a study [30], where it is s... |

153 |
Seminumerical Algorithms, volume 2 of The Art of Computer Programming
- Knuth
- 1981
(Show Context)
Citation Context ... over [0; 1[. This is actually a quite reasonable assumption in practice (sequences of reversed strings of bits are called Wald sequences in the literature, and are used to implement random sequences =-=[26]-=-). We shall also assume that building a couple (a ConsCell) takes constant time, which is the case in practice (see appendix on benchmarks). As for mathematical notation, we shall write p! the factori... |

153 |
Linear unification
- Paterson, Wegman
- 1978
(Show Context)
Citation Context ...e needed for its construction. Actually, there exists a clever algorithm that shares all unified subtrees as soon as they are recognized, and that takes linear time in term of the sizes of the inputs =-=[36]-=-. It has the drawback that it destroys the input while proceeding; but there exists a non-destructive version that is almost linear. A second example is the problem of building and using decision tree... |

133 |
An Efficient Incremental Automatic Garbage Collector
- Deutsch, Bobrow
- 1976
(Show Context)
Citation Context ...uthor [11][6] (though [11] is too slow in practice [12]); an algorithm based on a clever coding of reference counters, using the fact that most used cells are referenced exactly once, is presented in =-=[10]-=-. We have considered the difficulties tied to stop-and-copy style algorithms. Mark-and-sweep variants and reference counting techniques do not seem to pose any problem. However, the hypothesis of [10]... |

130 |
The Design of Dynamic Data Structures
- Overmars
- 1983
(Show Context)
Citation Context ...t) of all colliding entries. ffl Balanced trees are here taken to be AVL trees. There are a great number of other balanced trees, including B and B* trees, but we won't bother examining them all (see =-=[35]-=- for a description and complexity analyses). Maps (this paper) A-lists (sorted) A-lists (unsorted) Hash tables (N buckets, sorted) Hash tables (N buckets, unsorted) Balanced trees Space 2:44n cells 2n... |

125 | Fundamental Algorithms, Volume 1 of The Art of Computer Programming - Knuth - 1973 |

112 | Garbage collection can be faster than stack allocation Information Processing Letter, 25(4):275– 79
- Appel
- 1987
(Show Context)
Citation Context ...he quickest allocation functions: maintain a free list of unused cons cells, and grab the first one to get a new cell; or, like in Standard ML of New Jersey, simply manage the heap as a growing stack =-=[3]-=- (the garbage collector reclaims storage, not the ML program, and always compacts memory to form a contiguous heap). ffl for comparison purposes (equal), everything is not rosy; since structurally equ... |

100 | Average-case analysis of algorithms and data structures
- VITTER, FLAJOLET
- 1990
(Show Context)
Citation Context ...less than M , so M must become large, too. Assume that kM = u, 0 ! k ! 1. Then: ae SC = c (1=k \Gamma 1)ff ae MS = a + bk (fl(u=k) \Gamma kfl(u))ff Now, let's use as a guide the following theorem [14]=-=[47]-=- (the size of a structure being its number of nodes), which will guide us in the estimation of the sharing rate: Proposition 1 Let\Omega be a set of natural numbers containing 0, and T the set of tree... |

72 |
Incremental computation via function caching
- Pugh, Teitelbaum
- 1989
(Show Context)
Citation Context ...ntal computation, that is, how to compute the result of a function on some data that is similar but not identical to some previously encountered data, with a minimized cost. We use a function caching =-=[37]-=- technique in relation with memoization to realize this. 3 Chapter 2 Maximal sharing: benefits, suitability and costs 2.1 Benefits 2.1.1 Memory usage The benefits we may get from sharing data are many... |

65 |
An efficient machine-independent procedure for garbage collection in various list structures
- SCHORR, WAITE
- 1967
(Show Context)
Citation Context ... it usually slows down the execution of a program 1 , compared with a program written with only explicit deallocations, and uses memory to operate (except with the Deutsch, Schorr and Waite algorithm =-=[40]-=-[24]), it frees the programmer from explicitly freeing structures, and as such has been recognized of a fundamental importance. Note also that implicit deallocation actually saves space and time: spac... |

49 |
A Lisp Garbage Collector for Virtual Memory Computer Systems
- Fenichel, Yochelson
- 1969
(Show Context)
Citation Context ...oportional to the number of allocated cells before garbage collection, that is, to the size of the process. The traditional way to get a faster algorithm, is to implement a stop-and-copy strategy [33]=-=[13]-=-. We shall also have a look at other approaches, including reference counting and incremental garbage collection techniques. 2.4.1 Stop-and-copy The great advantage of stop-and-copy is that it takes t... |

47 |
The Definition of Standard ML
- Harper, Milner, et al.
- 1991
(Show Context)
Citation Context ...d the shared ("monocopy") lists [16], with conversions between them (making a fresh copy in one direction, copying with sharing in the other direction). The semantics of more recent language=-=s like ML [22]-=-[32] do not allow for physical modifications of any kind of structures, but only of special ref pointers. Systematic hash consing fits the semantics nicely. In chapter 2, a simple functional language ... |

46 |
An adaptive tenuring policy for generation scavengers
- Ungar, Jackson
- 1992
(Show Context)
Citation Context ...es for mark-and-sweep lies, see figure 2.1), then a full GC is done, marking all spaces, and sweeping through the newest one only. This strategy is called a tenuring policy by demographic feedback in =-=[46]-=-, and may recover some space, by deallocating ancient cells. If S is still crowded, then allocate a new space S 0 , which becomes the newest space, and make S the next-to-newest one. Allocate a free l... |

39 |
On the performance evaluation of extendible hashing and trie searching
- Flajolet
- 1983
(Show Context)
Citation Context ...ne is the smallest achievable height of such a tree. Hence the tree is not balanced, but not too much out of balance either. Proof: This property expresses the average height of a radix-exchange tree =-=[15]-=-[47]. We give here another, basic proof. To find the average height of a tree, we must compute P +1 d=0 d:P =d n , where P =d n is the probability that the height of the tree is exactly d. But P =d n ... |

34 |
Anatomy of Lisp
- Allen
- 1978
(Show Context)
Citation Context ...p m of cardinal n : : : : : : : : : : : : : : : : : : : : : 20 3.2 The two cases when retrieving an element from a map : : : : : : : : : : : : : : : : : : : : 21 2 Chapter 1 Introduction Hash consing =-=[2]-=- is a well-known technique to share data that are structurally equivalent. Originally, it was used in some Lisp systems, like HLISP [16][43] as a mechanism to save space in large programs. Not only di... |

34 |
Monocopy and Associative Algorithms in an Extended LISP
- Goto
- 1974
(Show Context)
Citation Context ...: : : : : : : : : : : : 21 2 Chapter 1 Introduction Hash consing [2] is a well-known technique to share data that are structurally equivalent. Originally, it was used in some Lisp systems, like HLISP =-=[16]-=-[43] as a mechanism to save space in large programs. Not only did this technique save space, but it also improved the run-time efficiency of some algorithms, particularly in computer algebra [17]. The... |

34 |
A real time garbage collector based on the lifetimes of objects
- Lieberman, Hewitt
- 1983
(Show Context)
Citation Context ...ation scavenging Generation scavenging is a garbage collection method that builds on the fact that in most functional programming languages, values are either ephemeral (they live shortly) or eternal =-=[28]-=-, and also that new values tend to point to old ones but not the converse. However, this method is originally an optimization of the stop-and-copy method. Without modification, it is as unusable as st... |

28 |
A Lisp garbage collector algorithm using serial secondary storage
- Minsky
- 1963
(Show Context)
Citation Context ...s proportional to the number of allocated cells before garbage collection, that is, to the size of the process. The traditional way to get a faster algorithm, is to implement a stop-and-copy strategy =-=[33]-=-[13]. We shall also have a look at other approaches, including reference counting and incremental garbage collection techniques. 2.4.1 Stop-and-copy The great advantage of stop-and-copy is that it tak... |

27 |
Functional programs as executable specifications
- Turner
- 1984
(Show Context)
Citation Context ... xng to the map fx 1 7! UNIT; : : : ; xn 7! UNITg, where UNIT is a special object of the language. Sets have already been considered of a fundamental importance in languages like SETL [41] or MIRANDA =-=[44]-=-[45]. However, we are convinced that it is the more general notion of maps that is crucial. This is vindicated by personal experience, and also justified by a study [30], where it is shown that most d... |

15 |
Perfect normal forms for discrete functions
- Billon
- 1987
(Show Context)
Citation Context ...red representations of Shannon trees with elimination of tautologies; since then, more efficient versions have been devised, by augmenting the proportion of shared subtrees in Typed Decision Diagrams =-=[7]-=- (TDG). BDDs usually have a size polynomial in the number of propositional variables, they are normal forms for propositional formulas, and logical operations are both easy to implement and fast (gene... |

12 |
A complexity calculus for recursive tree algorithms
- FLAJOLET, STEYAERT
- 1987
(Show Context)
Citation Context ...ays less than M , so M must become large, too. Assume that kM = u, 0 ! k ! 1. Then: ae SC = c (1=k \Gamma 1)ff ae MS = a + bk (fl(u=k) \Gamma kfl(u))ff Now, let's use as a guide the following theorem =-=[14]-=-[47] (the size of a structure being its number of nodes), which will guide us in the estimation of the sharing rate: Proposition 1 Let\Omega be a set of natural numbers containing 0, and T the set of ... |

11 |
interiisp Reference Manual
- Teitelman
- 1978
(Show Context)
Citation Context ...computation sharing. This aspect has been studied for quite a long time (the so-called memoizing functions were already provided in Ceyx, a package built atop LeLisp, in MacLisp [34] and in InterLisp =-=[42]-=-). Recently, this approach has been extended to deal with incremental computation (that is, computation that minimized the effort of recomputing when given similar but non identical input data) in [37... |

7 |
MacLisp Reference Manual
- Moon
- 1974
(Show Context)
Citation Context ...lty. This does actual computation sharing. This aspect has been studied for quite a long time (the so-called memoizing functions were already provided in Ceyx, a package built atop LeLisp, in MacLisp =-=[34]-=- and in InterLisp [42]). Recently, this approach has been extended to deal with incremental computation (that is, computation that minimized the effort of recomputing when given similar but non identi... |

6 |
an Airchinnigh. Tutorial lecture notes on the Irish school of the VDM
- Mac
- 1991
(Show Context)
Citation Context ...ages like SETL [41] or MIRANDA [44][45]. However, we are convinced that it is the more general notion of maps that is crucial. This is vindicated by personal experience, and also justified by a study =-=[30]-=-, where it is shown that most data structures in programming are: ffl sets (dictionaries for instance) ffl lists (lists, stacks, files, words, vectors, matrices, etc.) ffl maps, or many-to-one relatio... |

2 |
Recursive hashed data structures with applications to polynomial manipulations
- Goto, Kanada
- 1976
(Show Context)
Citation Context ...LISP [16][43] as a mechanism to save space in large programs. Not only did this technique save space, but it also improved the run-time efficiency of some algorithms, particularly in computer algebra =-=[17]-=-. The two major drawbacks that hash consing has are that: ffl it is unsafe to modify a structure that may be shared in an uncontrolled way, ffl managing the sharing may be costly, both in terms of spe... |

2 |
The HimML reference manual. available from the author
- Goubault
- 1992
(Show Context)
Citation Context ...desirable data structures like sets and maps (set-theoretic functions of finite domain) [23], opening the door to a whole slew of set- and map-based functional languages like POPS-Lisp [18] and HimML =-=[20]-=-, a variant of Standard ML written by the author. The average-case complexities of operations on sets and maps are investigated, and shown to be quite good indeed. Computation sharing and incremental ... |

2 |
Algorithms used in an implementation of HLISP
- TERASHIMA
- 1975
(Show Context)
Citation Context ...: : : : : : : : : : 21 2 Chapter 1 Introduction Hash consing [2] is a well-known technique to share data that are structurally equivalent. Originally, it was used in some Lisp systems, like HLISP [16]=-=[43]-=- as a mechanism to save space in large programs. Not only did this technique save space, but it also improved the run-time efficiency of some algorithms, particularly in computer algebra [17]. The two... |

1 |
Un gestionnaire de m'emoire temps r'eel pour syst`emes symboliques
- Beaudoing
- 1991
(Show Context)
Citation Context ... other algorithms. An incremental version of mark-and-sweep has been realized in Meta-VLisp [39], which is simpler than the other mark-and-sweep incremental garbage collectors known to the author [11]=-=[6]-=- (though [11] is too slow in practice [12]); an algorithm based on a clever coding of reference counters, using the fact that most used cells are referenced exactly once, is presented in [10]. We have... |

1 |
Glaneur de cellules parall`ele
- Dornic
- 1990
(Show Context)
Citation Context ... of mark-and-sweep has been realized in Meta-VLisp [39], which is simpler than the other mark-and-sweep incremental garbage collectors known to the author [11][6] (though [11] is too slow in practice =-=[12]-=-); an algorithm based on a clever coding of reference counters, using the fact that most used cells are referenced exactly once, is presented in [10]. We have considered the difficulties tied to stop-... |

1 |
The POPS theorem prover manual. Bull S.A. internal document
- Goubault
- 1991
(Show Context)
Citation Context ...ations of very desirable data structures like sets and maps (set-theoretic functions of finite domain) [23], opening the door to a whole slew of set- and map-based functional languages like POPS-Lisp =-=[18]-=- and HimML [20], a variant of Standard ML written by the author. The average-case complexities of operations on sets and maps are investigated, and shown to be quite good indeed. Computation sharing a... |

1 | The revised report on POPS proving techniques - Goubault - 1991 |

1 |
Leygues is chief of the GUI department at the Bull Research
- communication
(Show Context)
Citation Context ... and in particular in object-oriented systems, to develop the notion of relation (mapping some object to other objects, i.e. maps) as a central notion, rivalling in importance with inheritance itself =-=[27]-=-. 3.1.2 Illustration To give an illustration, in the POPS theorem prover prototype [19], we have implemented a structure to facilitate reasoning on first order theories in clausal form, in the presenc... |

1 |
Algorithms for list structure condensation
- Lindstrom
- 1973
(Show Context)
Citation Context ...The benefits we may get from sharing data are manyfold. The most obvious benefit is that memory usage may shrink considerably. This was historically the first motive; hash consing and list condensing =-=[29]-=- formed one solution and cdr-coding was the main other [2]. Succesful experiments in hash consing have been reported, notably in computer algebra systems [17]. In general, most AI applications use a l... |

1 |
Trait'e de Programmation Applicative
- Saint-James
- 1992
(Show Context)
Citation Context ...may be split into two groups: the ones relying on stop-and-copy variants [5], and the ones that are built on other algorithms. An incremental version of mark-and-sweep has been realized in Meta-VLisp =-=[39]-=-, which is simpler than the other mark-and-sweep incremental garbage collectors known to the author [11][6] (though [11] is too slow in practice [12]); an algorithm based on a clever coding of referen... |