## Multiset discrimination (2003)

Venue: | In preparation |

Citations: | 3 - 1 self |

### BibTeX

@INPROCEEDINGS{Henglein03multisetdiscrimination,

author = {Fritz Henglein},

title = {Multiset discrimination},

booktitle = {In preparation},

year = {2003}

}

### OpenURL

### Abstract

Multiset discrimination is a fundamental technique for finding duplicates in linear time without hashing or comparison-based sorting. It can be viewed as a generalization of equality (or equivalence) testing from two arguments to an arbitrary number of arguments since it decides all the pairwise equalities between its inputs in one go by grouping them into equivalence classes. In this paper we provide a general framework for multiset discrimination suitable for packaging multiset discriminators as a reusable software component. It shows how multiset discriminators can be defined polytypically; that is, inductively on the type structure of the input data. The polytypic discriminators are optimal for data structures without sharing. We show how linear time multiset discriminators can be defined for shared, acyclic data. Finally, we point out that three seemingly different algorithms on partition refinement for circular solve certain instances of multiset discrimination for We conclude by pulling them together into a single algorithm This allows extending multiset discrimination to abstract data types and type constructors and suggests that multiset discrimination should be built as base functionality into types, generalizing equality. The algorithmic ingredients behind multiset discrimination have been published before, though under disparate names and for special instances of multiset discrimination. Our contribution lies in demonstrating that can be combined for multiset discrimination in basically arbitrary cyclic data structures in time O(m log n) for data structures with m edges and n nodes. We provide general considerations for applying multiset discrimination vis a vis hashing and (comparison-based) sorting and give some empirical evidence of its practical efficiency.

### Citations

2432 | The Design and Analysis of Computer Algorithms - Aho, Holpcroft, et al. - 1974 |

363 | Types, abstraction and parametric polymorphism - Reynolds - 1983 |

353 |
Three partition refinement algorithms
- Paige, Tarjan
- 1987
(Show Context)
Citation Context ...tion for bag-valued nodes by multiset discrimination on numbers. This theorem when restricted to stores that contain bag-values only, has been proved by Cardon and Crochemore [AM82]. Paige and Tarjan =-=[PT87]-=- presented a somewhat simplified algorithm with the same bounds. The above theorem combines 8.3 Incremental discrimination for sets Dagification for set-values is the most difficult dagification subcl... |

328 | Theorems for free - Wadler - 1989 |

285 |
An n log n algorithm for minimizing the states in a finite automaton
- Hopcroft
- 1971
(Show Context)
Citation Context ...s efficient incremental computation of the necessary updates by employing a “modify-the-smaller-half” principle, which was introduced by Hopcroft for the minimization of deterministic finite automata =-=[Hop71]-=-. We shall see that, using the transformational algorithmic ideas of [CP89], the algorithms can be combined into a single framework that allows dagification of arbitrary stores; that is, stores where ... |

143 | Data Structure and Algorithms 1: Sorting and Searching - Mehlhorn - 1984 |

105 |
Variations on the common subexpression problem
- Downey, Sethi, et al.
- 1980
(Show Context)
Citation Context ...plies n ∼ =R n ′ . ✷ Other staging relations are possible and indeed necessary if the closure conditions are extended with ground axioms relating nodes at different heights. See Downey, Sethi, Tarjan =-=[DST80]-=- for an example of this. With a staging relation as a guide we can dagify an acyclic store in bottom-up fashion in a single pass, as shown in Figure 3. Lemma 7.5 Let R be any relation such that R = Eq... |

92 | Polytypic programming
- Jeuring, Jansson
- 1996
(Show Context)
Citation Context ...τ of type δτ for each τ, where Γ maps each type variable α occurring free in τ to a function Γ(α) of type δα. Note that ∆ Γ τ is defined polytypically, that is by induction on the type structure of τ =-=[JJ96]-=-. In particular, each k-ary type constructor is mapped to a k-ary discriminator constructor, which input discriminators for the input types of the type constructor and produces a discriminator for its... |

58 | Program derivation by fixed point computation
- Cai, Paige
- 1989
(Show Context)
Citation Context ...and m the number of (multi)edges. 26sThe algorithms have a common algorithmic structure: formulate the problem as a (greatest) fixed point problem and then compute the result by dominated convergence =-=[CP89]-=- to enable efficient incremental computation in each iteration step. The key step, solved differently in each case, is efficient incremental computation of the necessary updates by employing a “modify... |

37 | Using multiset discrimination to solve language processing problems without hashing, Theoret
- CAI, PAIGE
- 1995
(Show Context)
Citation Context ... input list. The definition of discriminator the problem of multiset discrimination as partitioning its input under some given equivalence relation (such as equality) was generalized by Cai and Paige =-=[CP95]-=- to allow for discrimination under the equivalence ≡f given by an additional function argument f: Input elements v1, v2 are equivalent, v1 ≡f v2, if they are mapped to the same range value under f; th... |

29 | Bonic R.: A linear time solution to the single function coarsest partition problem. Theoretical Computer Science 40 - Paige, Tarjan - 1985 |

16 |
Data Structures and Network Flow Algorithms. Volume CMBS
- Tarjan
- 1983
(Show Context)
Citation Context ... simplifying assumption is justified as long as the number of store addresses we require in an algorithm is reasonably related to the input size of the problem given problem instance; see e.g. Tarjan =-=[Tar83]-=-. • The size of a pair is the sum of the sizes of each component, even if the components are identical. The size function measures the space required to store a value in unboxed (endogenous) or in box... |

9 |
M.: Partitioning a graph in O(|A| log 2
- Cardon, Crochemore
- 1982
(Show Context)
Citation Context ...blem for lists, calling it the common subexpression problem [DST80]; Cardon and Crochemore solved it for bags, referring to it as finding the coarsest regular congruence refining an initial partition =-=[AM82]-=-; and Paige and Tarjan solved it for sets, calling it the relational coarsest partition problem [PT87, Section 3]. (Paige and Tarjan also gave a simpler algorithm than Cardon and Crochemore’s for bags... |

7 | Efficient algorithms for isomorphisms of simple types - ZIBIN, GIL, et al. - 2003 |

2 |
Analysis and Transformation of Set-Theoretic Languages. Mini-Course
- Paige
- 1995
(Show Context)
Citation Context ...dering, however: The above statement is correct for any total order ≤. Sorting according to an arbitrary datadependent ordering, which the algorithm itself finds, has been called weaksorting by Paige =-=[Pai95]-=-: A function f : list(bag(τ)) → list(list(τ)) is a weakly sorting function (or weak-sorter) if there exists a total order ≤ on τ such that 1. for f[�v1, . . . , �vk] = [�v ′ 1 , . . . , �v′ l ] we hav... |

1 | A linear time algorithm to solve the single function coarsest partition problem - Paige, Tarjan - 1984 |