Results 1  10
of
14
IPAddress Lookup Using LCTries
, 1998
"... There has recently been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multiGb/s routers for the Internet, without having to rely on multilayer switching techniques. We address this ..."
Abstract

Cited by 100 (0 self)
 Add to MetaCart
There has recently been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multiGb/s routers for the Internet, without having to rely on multilayer switching techniques. We address this problem by using an LCtrie, a trie structure with combined path and level compression. This data structure enables us to build efficient, compact and easily searchable implementations of an IP routing table. The structure can store both unicast and multicast addresses with the same average search times. The search depth increases as \Theta (log log n) with the number of entries in the table for a large class of distributions and it is independent of the length of the addresses. A node in the trie can be coded with four bytes. Only the size of the base vector, which contains the search strings, grows linearly with the length of the addresses when extended from 4 to 16 bytes, as mandated by the shift from IP version 4 to version 6. We present the basic structure, as well as an adaptive version that roughly doubles the number of lookups per second. More general classifications of packets that are needed for link sharing, quality of service provisioning and for multicast and multipath routing are also discussed. Our experimental results compare favorably with those reported previously in the research literature.
Implementing Sorting in Database Systems
 ACM Comput. Surv
, 2006
"... Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in mul ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in multiuser operations. This survey collects many of these techniques for easy reference by students, researchers, and product developers. It covers inmemory sorting, diskbased external sorting, and considerations that apply specifically to sorting in database systems.
CacheConscious Sorting of Large Sets of Strings with Dynamic Tries
"... Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing stringso ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing stringsorting algorithms as they do not utilise cache particularly well for large data sets. We propose a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets. It is simple, fast, and efficient. We experimentally compare burstsort to existing stringsorting algorithms on large and small sets of strings with a range of characteristics. These experiments show that, for large sets of strings, burstsort is almost twice as fast as any previous algorithm, due primarily to a lower rate of cache miss.
2003b). Using masks, suffix arraybased data structures, and multidimensional arrays to compute positional ngram statistics from corpora
 In Proceedings of the Workshop on Multiword Expressions of the 41st Annual Meeting of the Association of Computational Linguistics
"... This paper describes an implementation to compute positional ngram statistics (i.e. Frequency and Mutual Expectation) based on masks, suffix arraybased data structures and multidimensional arrays. Positional ngrams are ordered sequences of words that represent continuous or discontinuous substrings ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
This paper describes an implementation to compute positional ngram statistics (i.e. Frequency and Mutual Expectation) based on masks, suffix arraybased data structures and multidimensional arrays. Positional ngrams are ordered sequences of words that represent continuous or discontinuous substrings of a corpus. In particular, the positional ngram model has shown successful results for the extraction of discontinuous collocations from large corpora. However, its computation is heavy. For instance, 4.299.742 positional ngrams (n=1..7) can be generated from a 100.000word size corpus in a sevenword size window context. In comparison, only 700.000 ngrams would be computed for the classical ngram model. It is clear that huge efforts need to be made to process positional ngram statistics in reasonable time and space. Our solution shows O(h(F) N log N) time complexity where N is the corpus size and h(F) a function of the window context. 1
Cacheefficient string sorting using copying
 In submission
, 2006
"... Abstract. Burstsort is a cacheoriented sorting technique that uses a dynamic trie to efficiently divide large sets of string keys into related subsets small enough to sort in cache. In our original burstsort, string keys sharing a common prefix were managed via a bucket of pointers represented as a ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract. Burstsort is a cacheoriented sorting technique that uses a dynamic trie to efficiently divide large sets of string keys into related subsets small enough to sort in cache. In our original burstsort, string keys sharing a common prefix were managed via a bucket of pointers represented as a list or array; this approach was found to be up to twice as fast as the previous best string sorts, mostly because of a sharp reduction in outofcache references. In this paper we introduce Cburstsort, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality. On both Intel and PowerPC architectures, and on a wide range of string types, we show that sorting is typically twice as fast as our original burstsort, and four to five times faster than multikey quicksort and previous radixsorts. A variant that copies both suffixes and record pointers to buckets, CPburstsort, uses more memory but provides stable sorting. In current computers, where performance is limited by memory access latencies, these new algorithms can dramatically reduce the time needed for internal sorting of large numbers of strings. 1
Generic topdown discrimination
, 2009
"... We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on order and equivalence expressions denoting a rich class of total preorders and equivalence rel ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on order and equivalence expressions denoting a rich class of total preorders and equivalence relations, respectively. Discriminators improve the asymptotic performance of generic comparisonbased sorting and partitioning, yet do not expose more information than the underlying ordering relation, respectively equivalence. For a large class of order and equivalence expressions, including all standard orders for firstorder recursive types, the discriminators execute in worstcase linear time. The generic discriminators can be coded compactly using list comprehensions, with order expressions specified using Generalized Algebraic Data Types (GADTs). We give some examples of the uses of discriminators, including a new mostsignificantdigit lexicographic sorting algorithm and type isomorphism with an associativecommutative operator. Full source code of discriminators and their applications is included. 1 We argue discriminators should be basic operations for primitive and abstract types with equality. The basic multiset discriminator for references, originally due to Paige et al., is shown to be both efficient and fully abstract: it finds all duplicates of all references occurring in a list in linear time without leaking information about their representation. In particular, it behaves deterministically in the presence of garbage collection and nondeterministic heap allocation even when references are represented as raw machine addresses. In contrast, having only a binary equality test as in ML requires Θ(n 2) time, and allowing hashing for performance reasons as in Java, makes execution nondeterministic and complicates garbage collection.
Making a fast unstable sorting algorithm stable 1
"... This paper demonstrates how an unstable in place sorting algorithm, the ALR algorithm, can be made stable by temporary changing the sorting keys during the recursion. At ‘the bottom of the recursion ’ all subsequences with equal valued element are then individually sorted with a stable sorting subal ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper demonstrates how an unstable in place sorting algorithm, the ALR algorithm, can be made stable by temporary changing the sorting keys during the recursion. At ‘the bottom of the recursion ’ all subsequences with equal valued element are then individually sorted with a stable sorting subalgorithm (insertion sort or radix). Later, on backtrack the original keys are restored. This results in a stable sorting of the whole input. Unstable ALR is much faster than Quicksort (which is also unstable). In this paper it is demonstrated that StableALR, which is some 1030 % slower than the original unstable ALR, is still in most cases 2060 % faster than Quicksort. It is also shown to be faster than Flashsort, a new unstable in place, bucket type sorting algorithm. This is demonstrated for five different distributions of integers in a array of length from 50 to 97 million elements. The StableALR sorting algorithm can be extended to sort floating point numbers and strings and make effective use of a multi core CPU. 2 Keywords: stable sorting, radix, most significant radix, multi core CPU, Quicksort, Flashsort, ALR.
Using random sampling to build approximate tries for efficient string sorting
 In Proc. International Workshop on Efficient and Experimental
"... Abstract. Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cacheefficient. The approach in burstsort is to dynamically build a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SRburstsort, DRburstsort, and DRLburstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce cache misses further than did the original burstsort, by up to 37%, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained. 1
Fast String Sorting Using OrderPreserving Compression
"... We give experimental evidence for the benefits of orderpreserving compression in sorting algorithms. While, in general, any algorithm might benefit from compressed data because of reduced paging requirements, we identified two natural candidates that would further benefit from orderpreserving compr ..."
Abstract
 Add to MetaCart
We give experimental evidence for the benefits of orderpreserving compression in sorting algorithms. While, in general, any algorithm might benefit from compressed data because of reduced paging requirements, we identified two natural candidates that would further benefit from orderpreserving compression, namely stringoriented sorting algorithms and wordRAM algorithms for keys of bounded length. The wordRAM model has some of the fastest known sorting algorithms in practice. These algorithms are designed for keys of bounded length, usually 32 or 64 bits, which limits their direct applicability for strings. One possibility is to use an orderpreserving compression scheme, so that a boundedkeylength algorithm can be applied. For the case of standard algorithms, we took what is considered to be the among the fastest nonword RAM string sorting algorithms, Fast MKQSort, and measured its performance on compressed data. The Fast MKQSort algorithm of Bentley and Sedgewick is optimized to handle text strings. Our experiments show that ordercompression techniques results in savings of approximately 15 % over the same algorithm on noncompressed data. For the wordRAM, we modified Andersson’s sorting algorithm to handle variablelength keys. The resulting algorithm is faster than the standard Unix sort by a factor of 1.5X. Last, we used an orderpreserving scheme that is within a constant additive term
Implementation of Sorting in Database Systems
"... It has often been said that sorting algorithms are very instructional in their own right as well as representative of a variety of computer algorithms, and that the performance of sorting is indicative of the performance of a variety of other data management tasks. Therefore, there is a fair amount ..."
Abstract
 Add to MetaCart
It has often been said that sorting algorithms are very instructional in their own right as well as representative of a variety of computer algorithms, and that the performance of sorting is indicative of the performance of a variety of other data management tasks. Therefore, there is a fair amount of literature on the theory of sorting as well as on specific benchmark results. On the other hand, most commercial implementations of sorting do (or should!) exploit many techniques that are publicly known but not readily available in the research literature. This survey collects them for easy reference by students, researchers, and product developers. Its main purpose is not to introduce new algorithmic techniques or to evaluate experimentally the effectiveness of any one individual technique; instead, it gathers and organizes such techniques in order to enable, stimulate, and focus future research and development. 1