Results 1  10
of
11
Small forwarding tables for fast routing lookups
 in ACM Sigcomm
, 1997
"... For some time, the networking community has assumed that it is impossible to do IP routing lookups in software fast enough to support gigabit speeds. IP routing lookups must �nd the routing entry with the longest matching pre�x, a task that has been thought to require hardware support at lookup freq ..."
Abstract

Cited by 172 (0 self)
 Add to MetaCart
For some time, the networking community has assumed that it is impossible to do IP routing lookups in software fast enough to support gigabit speeds. IP routing lookups must �nd the routing entry with the longest matching pre�x, a task that has been thought to require hardware support at lookup frequencies of millions per second. We present a forwarding table data structure designed for quick routing lookups. Forwarding tables are small enough to �t in the cache of a conventional general purpose processor. With the table in cache, a 200 MHz Pentium Pro or a 333 MHz Alpha 21164 can perform a few million lookups per second. This means that it is feasible to do a full routing lookup for each IPpacket at gigabit speeds without special hardware. The forwarding tables are very small, a large routing table with 40,000 routing entries can be compacted to a forwarding table of 150�160 Kbytes. A lookup typically requires less than 100 instructions on an Alpha, using eight memory references accessing a total of 14 bytes. 1
Cacheefficient string sorting using copying
 In submission
, 2006
"... Abstract. Burstsort is a cacheoriented sorting technique that uses a dynamic trie to efficiently divide large sets of string keys into related subsets small enough to sort in cache. In our original burstsort, string keys sharing a common prefix were managed via a bucket of pointers represented as a ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract. Burstsort is a cacheoriented sorting technique that uses a dynamic trie to efficiently divide large sets of string keys into related subsets small enough to sort in cache. In our original burstsort, string keys sharing a common prefix were managed via a bucket of pointers represented as a list or array; this approach was found to be up to twice as fast as the previous best string sorts, mostly because of a sharp reduction in outofcache references. In this paper we introduce Cburstsort, which copies the unexamined tail of each key to the bucket and discards the original key to improve data locality. On both Intel and PowerPC architectures, and on a wide range of string types, we show that sorting is typically twice as fast as our original burstsort, and four to five times faster than multikey quicksort and previous radixsorts. A variant that copies both suffixes and record pointers to buckets, CPburstsort, uses more memory but provides stable sorting. In current computers, where performance is limited by memory access latencies, these new algorithms can dramatically reduce the time needed for internal sorting of large numbers of strings. 1
On the Performance of WEAKHEAPSORT
, 2000
"... . Dutton #1993# presents a further HEAPSORT variant called WEAKHEAPSORT, which also contains a new data structure for priority queues. The sorting algorithm and the underlying data structure are analyzed showing that WEAKHEAPSORT is the best HEAPSORT variant and that it has a lot of nice propert ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
. Dutton #1993# presents a further HEAPSORT variant called WEAKHEAPSORT, which also contains a new data structure for priority queues. The sorting algorithm and the underlying data structure are analyzed showing that WEAKHEAPSORT is the best HEAPSORT variant and that it has a lot of nice properties. It is shown that the worst case number of comparisons is ndlog ne# 2 dlog ne + n #dlog ne#nlog n +0:1nand weak heaps can be generated with n # 1 comparisons. A doubleended priority queue based on weakheaps can be generated in n + dn=2e#2 comparisons. Moreover, examples for the worst and the best case of WEAKHEAPSORT are presented, the number of WeakHeaps on f1;:::;ng is determined, and experiments on the average case are reported. 1
Improvements to the BurrowsWheeler Compression Algorithm: After BWT Stages
, 2003
"... ... This article describes improved algorithms for the run length encoding, inversion frequencies and weighted frequency count stages that follow the BurrowsWheeler Transform. Results for compression rates are presented for different variations of the algorithm together with compression and decompr ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
... This article describes improved algorithms for the run length encoding, inversion frequencies and weighted frequency count stages that follow the BurrowsWheeler Transform. Results for compression rates are presented for different variations of the algorithm together with compression and decompression times. Finally, an implementation with a compression rate of 2.238 bps on the Calgary Corpus is introduced, which is the best result published in this field to date.
Algorithms for Combinatorial Problems Related to Train Marshalling
 IN PROCEEDINGS OF AWOCA 2000, IN HUNTER VALLEY
, 2000
"... We discuss a train marshalling principle on a hump yard based on radix sort. Initially we show that the number of sorting steps is dependent on the number of "chains" in the permutation that maps the final position of each car to its initial position. A chain is here a maximal interval I = [i; ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We discuss a train marshalling principle on a hump yard based on radix sort. Initially we show that the number of sorting steps is dependent on the number of "chains" in the permutation that maps the final position of each car to its initial position. A chain is here a maximal interval I = [i; j], such that is monotonically increasing. This adaptive radix sorting scheme requires that the numbers of the items form an initial segment f1; : : : ; ng of the natural numbers. We also discuss the problem how to behave if the final position is not fixed but only has to satisfy certain requirements, e.g. cars of the same train have to appear consecutively. In general, we specify an ordering requirement by a PQtree where the leaves are the cars and inner nodes correspond to "blocks" (a train forms a block, a final destination in a train forms a block). In some blocks, the subblocks may be permuted in any order (corresponding to Pnodes), whereas in other blocks, the sequence of im...
Efficient TrieBased Sorting of Large Sets of Strings
 Proceedings of the Australasian Computer Science Conference
, 2003
"... Sorting is a fundamental algorithmic task. Many generalpurpose sorting algorithms have been developed, but efficiency gains can be achieved by designing algorithms for specific kinds of data, such as strings. In previous work we have shown that our burstsort, a triebased algorithm for sorting stri ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Sorting is a fundamental algorithmic task. Many generalpurpose sorting algorithms have been developed, but efficiency gains can be achieved by designing algorithms for specific kinds of data, such as strings. In previous work we have shown that our burstsort, a triebased algorithm for sorting strings, is for large data sets more efficient than all previous algorithms for this task. In this paper we reevaluate some of the implementation details of burstsort, in particular the method for managing buckets held at leaves. We show that better choice of data structures further improves the efficiency, at a small additional cost in memory. For sets of around 30,000,000 strings, our improved burstsort is nearly twice as fast as the previous best sorting algorithm.
Refining the PureC Cost Model
, 2001
"... The pureC cost model, a simple but realistic model for the performance of programs, was first presented by Katajainen and Tr¨aff in 1997. The model is refined by Bojesen, Katajainen and Spork in 1999 in to include the cost of cache misses. In this thesis the model will be further refined to include ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The pureC cost model, a simple but realistic model for the performance of programs, was first presented by Katajainen and Tr¨aff in 1997. The model is refined by Bojesen, Katajainen and Spork in 1999 in to include the cost of cache misses. In this thesis the model will be further refined to include the cost of branch mispredictions. Furthermore it will be discussed how to modify the pureC cost model to account for the instruction level parallelism of todays superscalar processors.
Implementing HEAPSORT with n log n  0.9n and QUICKSORT with n log n + 0.2n Comparisons
 ACM Journal of Experimental Algorithms
, 2002
"... With refinements to the WEAKHEAPSORT... ..."
Pushing the Limits in Sequential Sorting
 Proceedings of the 4 th International Workshop on Algorithm Engineering (WAE 2000
, 2000
"... With refinements to the WEAKHEAPSORT algorithm we establish the general and practical relevant sequential sorting algorithm RELAXEDWEAKHEAPSORT executing exactly ndlog ne#2 dlog ne + 1 # n log n # 0:9n comparisons on any given input. The number of transpositions is bounded by n plus the number of ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
With refinements to the WEAKHEAPSORT algorithm we establish the general and practical relevant sequential sorting algorithm RELAXEDWEAKHEAPSORT executing exactly ndlog ne#2 dlog ne + 1 # n log n # 0:9n comparisons on any given input. The number of transpositions is bounded by n plus the number of comparisons. Experiments show that RELAXEDWEAKHEAPSORT only requires O(n) extra bits. Even if this space is not available, with QUICKWEAKHEAPSORT we propose an efficient QUICKSORT variant with n log n+0:2n+ o(n) comparisons on the average. Furthermore, we present data showing that WEAKHEAPSORT, RELAXEDWEAKHEAPSORT and QUICKWEAKHEAPSORT beat other performant QUICKSORT and HEAPSORT variants even for moderate values of n.
Using random sampling to build approximate tries for efficient string sorting
 In Proc. International Workshop on Efficient and Experimental
"... Abstract. Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Algorithms for sorting large datasets can be made more efficient with careful use of memory hierarchies and reduction in the number of costly memory accesses. In earlier work, we introduced burstsort, a new string sorting algorithm that on large sets of strings is almost twice as fast as previous algorithms, primarily because it is more cacheefficient. The approach in burstsort is to dynamically build a small trie that is used to rapidly allocate each string to a bucket. In this paper, we introduce new variants of our algorithm: SRburstsort, DRburstsort, and DRLburstsort. These algorithms use a random sample of the strings to construct an approximation to the trie prior to sorting. Our experimental results with sets of over 30 million strings show that the new variants reduce cache misses further than did the original burstsort, by up to 37%, while simultaneously reducing instruction counts by up to 24%. In pathological cases, even further savings can be obtained. 1