Results 1 
3 of
3
Generic Discrimination  Sorting and Partitioning Unshared Data in Linear Time
, 2008
"... We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worstcase lineartime discrimination functions (discriminators) can be defined generically, by (co)induction on an expressive language of order denotations. The generic definition yields di ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worstcase lineartime discrimination functions (discriminators) can be defined generically, by (co)induction on an expressive language of order denotations. The generic definition yields discriminators that generalize both distributive sorting and multiset discrimination. The generic discriminator can be coded compactly using list comprehensions, with order denotations specified using Generalized Algebraic Data Types (GADTs). A GADTfree combinator formulation of discriminators is also given. We give some examples of the uses of discriminators, including a new mostsignificantdigit lexicographic sorting algorithm. Discriminators generalize binary comparison functions: They operate on n arguments at a time, but do not expose more information than the underlying equivalence, respectively ordering relation on the arguments. We argue that primitive types with equality (such as references in ML) and ordered types (such as the machine integer type), should expose their equality, respectively standard ordering relation, as discriminators: Having only a binary equality test on a type requires Θ(n 2) time to find all the occurrences of an element in a list of length n, for each element in the list, even if the equality test takes only constant time. A discriminator accomplishes this in linear time. Likewise, having only a (constanttime) comparison function requires Θ(n log n) time to sort a list of n elements. A discriminator can do this in linear time.
Generic topdown discrimination
, 2009
"... We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on order and equivalence expressions denoting a rich class of total preorders and equivalence rel ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on order and equivalence expressions denoting a rich class of total preorders and equivalence relations, respectively. Discriminators improve the asymptotic performance of generic comparisonbased sorting and partitioning, yet do not expose more information than the underlying ordering relation, respectively equivalence. For a large class of order and equivalence expressions, including all standard orders for firstorder recursive types, the discriminators execute in worstcase linear time. The generic discriminators can be coded compactly using list comprehensions, with order expressions specified using Generalized Algebraic Data Types (GADTs). We give some examples of the uses of discriminators, including a new mostsignificantdigit lexicographic sorting algorithm and type isomorphism with an associativecommutative operator. Full source code of discriminators and their applications is included. 1 We argue discriminators should be basic operations for primitive and abstract types with equality. The basic multiset discriminator for references, originally due to Paige et al., is shown to be both efficient and fully abstract: it finds all duplicates of all references occurring in a list in linear time without leaking information about their representation. In particular, it behaves deterministically in the presence of garbage collection and nondeterministic heap allocation even when references are represented as raw machine addresses. In contrast, having only a binary equality test as in ML requires Θ(n 2) time, and allowing hashing for performance reasons as in Java, makes execution nondeterministic and complicates garbage collection.
Using random sampling to build approximate tries for efficient string sorting
 In Proc. International Workshop on Efficient and Experimental
"... Abstract. Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cacheefficient. The approach in burstsort is to dynamically build a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SRburstsort, DRburstsort, and DRLburstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce cache misses further than did the original burstsort, by up to 37%, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained. 1