Results 1  10
of
19
IPAddress Lookup Using LCTries
, 1998
"... There has recently been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multiGb/s routers for the Internet, without having to rely on multilayer switching techniques. We address this ..."
Abstract

Cited by 102 (0 self)
 Add to MetaCart
There has recently been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multiGb/s routers for the Internet, without having to rely on multilayer switching techniques. We address this problem by using an LCtrie, a trie structure with combined path and level compression. This data structure enables us to build efficient, compact and easily searchable implementations of an IP routing table. The structure can store both unicast and multicast addresses with the same average search times. The search depth increases as \Theta (log log n) with the number of entries in the table for a large class of distributions and it is independent of the length of the addresses. A node in the trie can be coded with four bytes. Only the size of the base vector, which contains the search strings, grows linearly with the length of the addresses when extended from 4 to 16 bytes, as mandated by the shift from IP version 4 to version 6. We present the basic structure, as well as an adaptive version that roughly doubles the number of lookups per second. More general classifications of packets that are needed for link sharing, quality of service provisioning and for multicast and multipath routing are also discussed. Our experimental results compare favorably with those reported previously in the research literature.
Implementing Sorting in Database Systems
 ACM Comput. Surv
, 2006
"... Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in mul ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in multiuser operations. This survey collects many of these techniques for easy reference by students, researchers, and product developers. It covers inmemory sorting, diskbased external sorting, and considerations that apply specifically to sorting in database systems.
CacheConscious Sorting of Large Sets of Strings with Dynamic Tries
"... Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing stringso ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing stringsorting algorithms as they do not utilise cache particularly well for large data sets. We propose a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets. It is simple, fast, and efficient. We experimentally compare burstsort to existing stringsorting algorithms on large and small sets of strings with a range of characteristics. These experiments show that, for large sets of strings, burstsort is almost twice as fast as any previous algorithm, due primarily to a lower rate of cache miss.
2003b). Using masks, suffix arraybased data structures, and multidimensional arrays to compute positional ngram statistics from corpora
 In Proceedings of the Workshop on Multiword Expressions of the 41st Annual Meeting of the Association of Computational Linguistics
"... This paper describes an implementation to compute positional ngram statistics (i.e. Frequency and Mutual Expectation) based on masks, suffix arraybased data structures and multidimensional arrays. Positional ngrams are ordered sequences of words that represent continuous or discontinuous substrings ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
This paper describes an implementation to compute positional ngram statistics (i.e. Frequency and Mutual Expectation) based on masks, suffix arraybased data structures and multidimensional arrays. Positional ngrams are ordered sequences of words that represent continuous or discontinuous substrings of a corpus. In particular, the positional ngram model has shown successful results for the extraction of discontinuous collocations from large corpora. However, its computation is heavy. For instance, 4.299.742 positional ngrams (n=1..7) can be generated from a 100.000word size corpus in a sevenword size window context. In comparison, only 700.000 ngrams would be computed for the classical ngram model. It is clear that huge efforts need to be made to process positional ngram statistics in reasonable time and space. Our solution shows O(h(F) N log N) time complexity where N is the corpus size and h(F) a function of the window context. 1
Cacheefficient string sorting using copying
 In submission
, 2006
"... Abstract. Burstsort is a cacheoriented sorting technique that uses a dynamic trie to efficiently divide large sets of string keys into related subsets small enough to sort in cache. In our original burstsort, string keys sharing a common prefix were managed via a bucket of pointers represented as a ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract. Burstsort is a cacheoriented sorting technique that uses a dynamic trie to efficiently divide large sets of string keys into related subsets small enough to sort in cache. In our original burstsort, string keys sharing a common prefix were managed via a bucket of pointers represented as a list or array; this approach was found to be up to twice as fast as the previous best string sorts, mostly because of a sharp reduction in outofcache references. In this paper we introduce Cburstsort, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality. On both Intel and PowerPC architectures, and on a wide range of string types, we show that sorting is typically twice as fast as our original burstsort, and four to five times faster than multikey quicksort and previous radixsorts. A variant that copies both suffixes and record pointers to buckets, CPburstsort, uses more memory but provides stable sorting. In current computers, where performance is limited by memory access latencies, these new algorithms can dramatically reduce the time needed for internal sorting of large numbers of strings. 1
Generic topdown discrimination
, 2009
"... We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on order and equivalence expressions denoting a rich class of total preorders and equivalence rel ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that discriminators (discrimination functions) can be defined generically, by structural recursion on order and equivalence expressions denoting a rich class of total preorders and equivalence relations, respectively. Discriminators improve the asymptotic performance of generic comparisonbased sorting and partitioning, yet do not expose more information than the underlying ordering relation, respectively equivalence. For a large class of order and equivalence expressions, including all standard orders for firstorder recursive types, the discriminators execute in worstcase linear time. The generic discriminators can be coded compactly using list comprehensions, with order expressions specified using Generalized Algebraic Data Types (GADTs). We give some examples of the uses of discriminators, including a new mostsignificantdigit lexicographic sorting algorithm and type isomorphism with an associativecommutative operator. Full source code of discriminators and their applications is included. 1 We argue discriminators should be basic operations for primitive and abstract types with equality. The basic multiset discriminator for references, originally due to Paige et al., is shown to be both efficient and fully abstract: it finds all duplicates of all references occurring in a list in linear time without leaking information about their representation. In particular, it behaves deterministically in the presence of garbage collection and nondeterministic heap allocation even when references are represented as raw machine addresses. In contrast, having only a binary equality test as in ML requires Θ(n 2) time, and allowing hashing for performance reasons as in Java, makes execution nondeterministic and complicates garbage collection.
Making a fast unstable sorting algorithm stable
, 2006
"... This paper demonstrates how an unstable in place sorting algorithm, the ALR algorithm, can be made stable by temporary changing the sorting keys during the recursion. At ‘the bottom of the recursion ’ all subsequences with equal valued element are then individually sorted with a stable sorting subal ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper demonstrates how an unstable in place sorting algorithm, the ALR algorithm, can be made stable by temporary changing the sorting keys during the recursion. At ‘the bottom of the recursion ’ all subsequences with equal valued element are then individually sorted with a stable sorting subalgorithm (insertion sort or radix). Later, on backtrack the original keys are restored. This results in a stable sorting of the whole input. Unstable ALR is much faster than Quicksort (which is also unstable). In this paper it is demonstrated that StableALR, which is some 1030 % slower than the original unstable ALR, is still in most cases 2060 % faster than Quicksort. It is also shown to be faster than Flashsort, a new unstable in place, bucket type sorting algorithm. This is demonstrated for five different distributions of integers in a array of length from 50 to 97 million elements. The StableALR sorting algorithm can be extended to sort floating point numbers and strings and make effective use of a multi core CPU.
Using random sampling to build approximate tries for efficient string sorting
 In Proc. International Workshop on Efficient and Experimental
"... Abstract. Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cacheefficient. The approach in burstsort is to dynamically build a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SRburstsort, DRburstsort, and DRLburstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce cache misses further than did the original burstsort, by up to 37%, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained. 1
Post BWT Stages of the . . .
"... The lossless BurrowsWheeler compression algorithm has received considerable attention over recent years for both its simplicity and effectiveness. It is based on a permutation of the input sequence − the BurrowsWheeler transformation − which groups symbols with a similar context close together. In ..."
Abstract
 Add to MetaCart
The lossless BurrowsWheeler compression algorithm has received considerable attention over recent years for both its simplicity and effectiveness. It is based on a permutation of the input sequence − the BurrowsWheeler transformation − which groups symbols with a similar context close together. In the original version, this permutation was followed by a MoveToFront transformation and a final entropy coding stage. Later versions used different algorithms, placed after the BurrowsWheeler transformation, since the following stages have a significant influence on the compression rate. This article describes different algorithms and improvements for these post BWT stages including a new context based approach. Results for compression rates are presented together with compression and decompression times on the Calgary corpus, the Canterbury corpus, the large Canterbury corpus and the Lukas 2D 16 bit medical image corpus.
An Efficient, Versatile . . .
, 2008
"... Sorting the suffixes of a string into lexicographical order is a fundamental task in a number of contexts, most notably lossless compression (Burrows–Wheeler transformation) and text indexing (suffix arrays). Most approaches to suffix sorting produce a sorted array of suffixes directly, continually ..."
Abstract
 Add to MetaCart
Sorting the suffixes of a string into lexicographical order is a fundamental task in a number of contexts, most notably lossless compression (Burrows–Wheeler transformation) and text indexing (suffix arrays). Most approaches to suffix sorting produce a sorted array of suffixes directly, continually moving suffixes into their final place in the array until the ordering is complete. In this article, we describe a novel and resourceefficient (time and memory) approach to suffix sorting, which works in a complementary way—by assigning each suffix its rank in the final ordering, before converting to a sorted array, if necessary, once all suffixes are ranked. We layer several powerful extensions on this basic idea and show experimentally that our approach is superior to other leading algorithms in a variety of realworld contexts.