## Generic Discrimination -- Sorting and Partitioning Unshared Data in Linear Time (2008)

Citations: | 5 - 4 self |

### BibTeX

@MISC{Henglein08genericdiscrimination,

author = {Fritz Henglein},

title = {Generic Discrimination -- Sorting and Partitioning Unshared Data in Linear Time },

year = {2008}

}

### OpenURL

### Abstract

We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worst-case linear-time discrimination functions (discriminators) can be defined generically, by (co-)induction on an expressive language of order denotations. The generic definition yields discriminators that generalize both distributive sorting and multiset discrimination. The generic discriminator can be coded compactly using list comprehensions, with order denotations specified using Generalized Algebraic Data Types (GADTs). A GADT-free combinator formulation of discriminators is also given. We give some examples of the uses of discriminators, including a new most-significant-digit lexicographic sorting algorithm. Discriminators generalize binary comparison functions: They operate on n arguments at a time, but do not expose more information than the underlying equivalence, respectively ordering relation on the arguments. We argue that primitive types with equality (such as references in ML) and ordered types (such as the machine integer type), should expose their equality, respectively standard ordering relation, as discriminators: Having only a binary equality test on a type requires Θ(n 2) time to find all the occurrences of an element in a list of length n, for each element in the list, even if the equality test takes only constant time. A discriminator accomplishes this in linear time. Likewise, having only a (constant-time) comparison function requires Θ(n log n) time to sort a list of n elements. A discriminator can do this in linear time.

### Citations

538 | Sorting networks and their applications
- Batcher
- 1968
(Show Context)
Citation Context ....4) Θ(N 2 ) Heapsort (Williams 1964) Θ(N 2 ) Selection sort (Knuth 1998, Sec. 5.2.3) Θ(N 3 ) Insertion sort (Knuth 1998, Sec. 5.2.1) Θ(N 2 ) Bubble sort (Knuth 1998, Sec. 5.2.2) Θ(N 2 ) Bitonic sort (=-=Batcher 1968-=-) Θ(N log 2 N) Shell sort (Shell 1959) Θ(N log 2 N) Odd-even mergesort (Batcher 1968) Θ(N log 2 N) AKS sorting network (Ajtai et al. 1983) Θ(N log N) sequential computer. (Indeed we only require point... |

212 |
The Art of Computer Programming, Sorting and Searching
- Knuth
- 1997
(Show Context)
Citation Context ...th fixed word width, say 32 or 64 bits, corresponding to a conventional 99Table 1. Comparison-based sorting algorithms for complex data Sort Time complexity Quicksort (Hoare 1961) Θ(N 2 ) Mergesort (=-=Knuth 1998-=-, Sec. 5.2.4) Θ(N 2 ) Heapsort (Williams 1964) Θ(N 2 ) Selection sort (Knuth 1998, Sec. 5.2.3) Θ(N 3 ) Insertion sort (Knuth 1998, Sec. 5.2.1) Θ(N 2 ) Bubble sort (Knuth 1998, Sec. 5.2.2) Θ(N 2 ) Bito... |

154 |
Algorithm 232: Heapsort
- Williams
- 1964
(Show Context)
Citation Context ...rresponding to a conventional 99Table 1. Comparison-based sorting algorithms for complex data Sort Time complexity Quicksort (Hoare 1961) Θ(N 2 ) Mergesort (Knuth 1998, Sec. 5.2.4) Θ(N 2 ) Heapsort (=-=Williams 1964-=-) Θ(N 2 ) Selection sort (Knuth 1998, Sec. 5.2.3) Θ(N 3 ) Insertion sort (Knuth 1998, Sec. 5.2.1) Θ(N 2 ) Bubble sort (Knuth 1998, Sec. 5.2.2) Θ(N 2 ) Bitonic sort (Batcher 1968) Θ(N log 2 N) Shell so... |

150 |
Data Structures and Algorithms 1: Sorting and Searching
- Mehlhorn
- 1984
(Show Context)
Citation Context ...that is, for each input string the minimum prefix required to distinguish the string from all other input strings. (If a string occurs twice, all characters are inspected.) It has the known weakness (=-=Mehlhorn 1984-=-), however, that there are usually many calls to the Char-discriminator with only few arguments. The Char-discriminator returns its input by traversing an array, the bucket table, of some fixed size m... |

115 |
Sorting in c log n parallel steps
- Ajtai, Komlós, et al.
- 1983
(Show Context)
Citation Context ... Θ(N 2 ) Bubble sort (Knuth 1998, Sec. 5.2.2) Θ(N 2 ) Bitonic sort (Batcher 1968) Θ(N log 2 N) Shell sort (Shell 1959) Θ(N log 2 N) Odd-even mergesort (Batcher 1968) Θ(N log 2 N) AKS sorting network (=-=Ajtai et al. 1983-=-) Θ(N log N) sequential computer. (Indeed we only require pointer operations – the random access is not required for our complexity results to hold.) In this setting the only meaningful measure of the... |

93 | Polytypic programming
- Jeuring, Jansson
- 1996
(Show Context)
Citation Context ...enotations, order denotations can easily be eliminated by partial evaluation, which results in a combinator library for discriminators. This can be thought of as an exercise in polytypic programming (=-=Jeuring and Jansson 1996-=-; Hinze 2000), extended from type denotations (one per type) to order denotations (many per type). -- Discriminator combinators derived from generic definition -- NB: For simplicity without shortcut c... |

86 | Sorting in linear time - Andersson, Hagerup, et al. - 1995 |

36 | Using multiset discrimination to solve language processing problems without hashing
- Cai, Paige
- 1995
(Show Context)
Citation Context ...h means m dominates for small values of n. If the output does not need to be alphabetically sorted traversal time can be made independent of the array size by employing basic multiset discrimination (=-=Cai and Paige 1995-=-, Section 2.2). This motivated Paige and Tarjan to break lexicographic sorting into two phases: In the first phase they identify equal elements, but do not return them in sorted order; instead they bu... |

35 | Generalizing generalized tries - Hinze - 2000 |

31 | A New Efficient Radix Sort
- Andersson, Nilsson
- 1994
(Show Context)
Citation Context ...rrectness of this code exploits the fact that disc is stable. It works for all order denotations. Restricted to the standard lexicographic ordering on strings it is the key idea in Forward Radixsort (=-=Andersson and Nilsson 1994-=-, 1998). It can also be thought of as a local application of least-significant-digit (LSD) sorting, expressed in terms of discriminators. Going from processing one group at a time to processing all of... |

31 |
A high-speed sorting procedure
- Shell
- 1959
(Show Context)
Citation Context ...N 2 ) Selection sort (Knuth 1998, Sec. 5.2.3) Θ(N 3 ) Insertion sort (Knuth 1998, Sec. 5.2.1) Θ(N 2 ) Bubble sort (Knuth 1998, Sec. 5.2.2) Θ(N 2 ) Bitonic sort (Batcher 1968) Θ(N log 2 N) Shell sort (=-=Shell 1959-=-) Θ(N log 2 N) Odd-even mergesort (Batcher 1968) Θ(N log 2 N) AKS sorting network (Ajtai et al. 1983) Θ(N log N) sequential computer. (Indeed we only require pointer operations – the random access is ... |

28 | Integer sorting in O(n √ log log n) expected time and linear space - HAN, THORUP |

20 | Static dictionaries on AC0 RAMs: Query time Yð ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffilog n=log log np Þ is necessary and sufficient
- Andersson, Miltersen, et al.
- 1996
(Show Context)
Citation Context ...ords occupied than the number of bits; on the other hand, this is partially offset by having access to word-level parallelism, RAM-operations such as bitshift, addition or generally AC 0 -operations (=-=Andersson et al. 1996-=-), which have shallow circuit implementations. 6 Applications We present some applications of discrimination (including its variations as sorting and partitioning functions), which are intended to ill... |

17 | Efficient translation of external input in a dynamically typed language. IFIP 94 proceedings
- Paige
- 1994
(Show Context)
Citation Context ...ly been introduced and developed as an algorithmic tool set for efficiently partitioning and preprocessing data according to certain equivalence relations on strings and trees (Paige and Tarjan 1987; =-=Paige 1994-=-; Cai and Paige 1995; Paige and Yang 1997). We have shown how to analyze multiset discrimination into its functional core components, identifying the notion of discriminator as the core abstraction, a... |

16 |
Algorithm 63: partition
- Hoare
- 1961
(Show Context)
Citation Context ...n is a random-access machine with fixed word width, say 32 or 64 bits, corresponding to a conventional 99Table 1. Comparison-based sorting algorithms for complex data Sort Time complexity Quicksort (=-=Hoare 1961-=-) Θ(N 2 ) Mergesort (Knuth 1998, Sec. 5.2.4) Θ(N 2 ) Heapsort (Williams 1964) Θ(N 2 ) Selection sort (Knuth 1998, Sec. 5.2.3) Θ(N 3 ) Insertion sort (Knuth 1998, Sec. 5.2.1) Θ(N 2 ) Bubble sort (Knuth... |

13 | Surpassing the information-theoretic bound with fusion trees - Fredman, Willard - 1993 |

11 | Efficient Type Matching
- Jha, Palsberg, et al.
- 2002
(Show Context)
Citation Context ...and indeed part prod3 does not run in linear time: it takes exponential time! It has been shown that this problem can be solved in linear time over tree (unboxed) representations of type expressions (=-=Jha et al. 2008-=-) by applying bottom-up multiset discrimination for trees with weak sorting (Paige 1991). For pairs of types this has also been proved separately (Zibin et al. 2003), where basic multiset discriminati... |

8 | Radix sorting with no extra space - Franceschini, Muthukrishnan, et al. - 2007 |

8 | Efficient Algorithms for Isomorphisms of Simple Types - Zibin, Gil, et al. - 2003 |

5 | a faster in-place, cache friendly sorting algorithm - ARL - 2002 |

4 |
Multiset discrimination for internal and external data management
- Ambus
- 2004
(Show Context)
Citation Context ...ige 1995), which does not incur the penalty of traversal of empty buckets, breadth-first group processing has been observed to have noticeably worse practical performance than depth-first processing (=-=Ambus 2004-=-, Section 2.4). We conjecture that concatenating not all groups ys returned by disc r1 in the defining clause for disc (Pair r1 r2), but just as many as is necessary to fill the bucket table to “pay” ... |

3 | Efficient trie-based sorting of large sets of strings
- Sinha, Zobel
- 2003
(Show Context)
Citation Context ...ure such as a trie may at first appear too expensive to be useful in practice, a similar two-phase approach is taken in what is claimed to be the fastest string sorting algorithm for large data sets (=-=Sinha and Zobel 2003-=-). Another solution is possible, however, which does not require building a trie for the entire input. Consider the code for discrimination of pairs: disc (Pair r1 r2) xs = [ vs | ys ← disc r1 [(k1,(k... |

2 |
Aha! Algorithms
- Bentley
- 1983
(Show Context)
Citation Context ... also use toLower instead of toUpper, which illustrates that the same order may have multiple denotations. 6.2 Anagram classes A classical problem treated by Bentley in his programming pearls series (=-=Bentley 1983-=-) is anagram classes: Given a list of words from a dictionary find their anagram classes; that is find all words that are permutations of each other and do this for all the words in the dictionary. Th... |

1 | The Glasgow Haskell Compiler. http://www.haskell.org/ghc - Haskell - 2005 |

1 |
Multiset discrimination. Unpublished manuscript. See http://plan-x.org/msd/multiset-discrimination.pdf
- Henglein
- 2003
(Show Context)
Citation Context ...down discrimination embodied in our generic discriminator gives asymptotically optimal performance only for unshared data. Dealing with sharing requires bottom-up multiset discrimination (Paige 1991; =-=Henglein 2003-=-). Our computational model is a pointer machine with basic operations operating on constant-sized data. In particular, operations on pairs (construction, projection), tagged values (tagging, pattern m... |

1 |
A language for total preorders. Unfinished manuscript
- Henglein
- 2008
(Show Context)
Citation Context ...notations with free variables denote order-mapping functions, which constitute the morphisms in the category TPreorder of total preorders, which in turn admits (least) fixed points as inverse limits (=-=Henglein 2008-=-). -- Fix rf : fixed-point of order constructor rf Fix :: (Order t → Order t) → Order t The Haskell type of Fix allows it to be applied to arbitrary computational functions mapping order denotations. ... |

1 | Thomas Ambus. Multiset discrimination for internal and external data management - Informatica - 2004 |

1 | Multiset discrimination — a method for implementing programming language systems without hashing. Theoretical Computer Science (TCS
- Cai, Paige
- 1994
(Show Context)
Citation Context ...ich in essence is just bucket sorting. If the output does not need to be alphabetically sorted traversal time can be made independent of the array size since, employing basic multiset discrimination (=-=Cai and Paige 1994-=-, Section 2.2), the traversal of empty array cells in bucket sorting can be avoided. This motivated Paige and Tarjan to break lexicographic sorting into two phases: In the first phase they identify eq... |

1 | ISBN 0-262-03141-8 (MIT Press) and ISBN 0-07-013143-0 (McGraw-Hill). 23 - Press, McGraw-Hill - 1990 |

1 | Three partition refinement algorithms - Draft - 1991 |