## Multiset discrimination (2003)

### BibTeX

@INPROCEEDINGS{Henglein03multisetdiscrimination,

author = {Fritz Henglein},

title = {Multiset discrimination},

booktitle = {In preparation},

year = {2003}

}

### Abstract

Multiset discrimination is a fundamental technique for finding duplicates in linear time without hashing or comparison-based sorting. It can be viewed as a generalization of equality (or equivalence) testing from two arguments to an arbitrary number of arguments since it decides all the pairwise equalities between its inputs in one go by grouping them into equivalence classes. In this paper we provide a general framework for multiset discrimination suitable for packaging multiset discriminators as a reusable software component. It shows how multiset discriminators can be defined polytypically; that is, inductively on the type structure of the input data. The polytypic discriminators are optimal for data structures without sharing. We show how linear time multiset discriminators can be defined for shared, acyclic data. Finally, we point out that three seemingly different algorithms on partition refinement for circular solve certain instances of multiset discrimination for We conclude by pulling them together into a single algorithm This allows extending multiset discrimination to abstract data types and type constructors and suggests that multiset discrimination should be built as base functionality into types, generalizing equality. The algorithmic ingredients behind multiset discrimination have been published before, though under disparate names and for special instances of multiset discrimination. Our contribution lies in demonstrating that can be combined for multiset discrimination in basically arbitrary cyclic data structures in time O(m log n) for data structures with m edges and n nodes. We provide general considerations for applying multiset discrimination vis a vis hashing and (comparison-based) sorting and give some empirical evidence of its practical efficiency.