Results 1 - 10
of
34
Fast Algorithms for Sorting and Searching Strings
, 1997
"... We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching a ..."
Abstract
-
Cited by 131 (0 self)
- Add to MetaCart
We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching algorithm blends tries and binary search trees; it is faster than hashing and other commonly used search methods. The basic ideas behind the algorithms date back at least to the 1960s, but their practical utility has been overlooked. We also present extensions to more complex string problems, such as partial-match searching. 1. Introduction Section 2 briefly reviews Hoare's [9] Quicksort and binary search trees. We emphasize a well-known isomorphism relating the two, and summarize other basic facts. The multikey algorithms and data structures are presented in Section 3. Multikey Quicksort orders a set of n vectors with k components each. Like regular Quicksort, it partitions its input into...
IP-Address Lookup Using LC-Tries
, 1998
"... There has recently been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multi-Gb/s routers for the Internet, without having to rely on multi-layer switching techniques. We address this ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
There has recently been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multi-Gb/s routers for the Internet, without having to rely on multi-layer switching techniques. We address this problem by using an LC-trie, a trie structure with combined path and level compression. This data structure enables us to build efficient, compact and easily searchable implementations of an IP routing table. The structure can store both unicast and multicast addresses with the same average search times. The search depth increases as \Theta (log log n) with the number of entries in the table for a large class of distributions and it is independent of the length of the addresses. A node in the trie can be coded with four bytes. Only the size of the base vector, which contains the search strings, grows linearly with the length of the addresses when extended from 4 to 16 bytes, as mandated by the shift from IP version 4 to version 6. We present the basic structure, as well as an adaptive version that roughly doubles the number of lookups per second. More general classifications of packets that are needed for link sharing, quality of service provisioning and for multicast and multipath routing are also discussed. Our experimental results compare favorably with those reported previously in the research literature.
Engineering a lightweight suffix array construction algorithm (Extended Abstract)
"... In this paper we consider the problem of computing the suffix array of a text T [1, n]. This problem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matchi ..."
Abstract
-
Cited by 57 (4 self)
- Add to MetaCart
In this paper we consider the problem of computing the suffix array of a text T [1, n]. This problem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matching problems involving both linguistic texts and biological data [4, 11]. Recently, the interest in this data structure has been revitalized by its use as a building block for three novel applications: (1) the Burrows-Wheeler compression algorithm [3], which is a provably [17] and practically [20] effective compression tool; (2) the construction of succinct [10, 19] and compressed [7, 8] indexes; the latter can store both the input text and its full-text index using roughly the same space used by traditional compressors for the text alone; and (3) algorithms for clustering and ranking the answers to user queries in web-search engines [22]. In all these applications the construction of the suffix array is the computational bottleneck both in time and space. This motivated our interest in designing yet another suffix array construction algorithm which is fast and "lightweight" in the sense that it uses small space...
FASTER SUFFIX SORTING
, 1999
"... We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our
Second Phase Changes in Random M-Ary Search Trees and Generalized Quicksort: Convergence Rates
, 2002
"... We study the convergence rate to normal limit law for the space requirement of random m-ary search trees. While it is known that the random variable is asymptotically normally distributed for 3 m 26 and that the limit law does not exist for m ? 26, we show that the convergence rate is O(n ) for ..."
Abstract
-
Cited by 38 (11 self)
- Add to MetaCart
We study the convergence rate to normal limit law for the space requirement of random m-ary search trees. While it is known that the random variable is asymptotically normally distributed for 3 m 26 and that the limit law does not exist for m ? 26, we show that the convergence rate is O(n ) for 3 m 19 and is O(n ), where 4=3 ! ff ! 3=2 is a parameter depending on m for 20 m 26. Our approach is based on a refinement to the method of moments and applicable to other recursive random variables; we briefly mention the applications to quicksort proper and the generalized quicksort of Hennequin, where more phase changes are given. These results provide natural, concrete examples for which the Berry-Esseen bounds are not necessarily proportional to the reciprocal of the standard deviation. Local limit theorems are also derived. Abbreviated title. Phase changes in search trees.
Phase Change of Limit Laws in the Quicksort Recurrence Under Varying Toll Functions
, 2001
"... We characterize all limit laws of the quicksort type random variables defined recursively by Xn = X In + X # n-1-In + Tn when the "toll function" Tn varies and satisfies general conditions, where (Xn ), (X # n ), (I n , Tn ) are independent, Xn . . . , n 1}. When the "toll function" Tn ..."
Abstract
-
Cited by 37 (17 self)
- Add to MetaCart
We characterize all limit laws of the quicksort type random variables defined recursively by Xn = X In + X # n-1-In + Tn when the "toll function" Tn varies and satisfies general conditions, where (Xn ), (X # n ), (I n , Tn ) are independent, Xn . . . , n 1}. When the "toll function" Tn (cost needed to partition the original problem into smaller subproblems) is small (roughly lim sup n## log E(Tn )/ log n 1/2), Xn is asymptotically normally distributed; non-normal limit laws emerge when Tn becomes larger. We give many new examples ranging from the number of exchanges in quicksort to sorting on broadcast communication model, from an in-situ permutation algorithm to tree traversal algorithms, etc.
A taxonomy of suffix array construction algorithms
- ACM Computing Surveys
, 2007
"... In 1990, Manber and Myers proposed suffix arrays as a space-saving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few years, suffix array construction algorithms have proliferated in bewildering abunda ..."
Abstract
-
Cited by 30 (10 self)
- Add to MetaCart
In 1990, Manber and Myers proposed suffix arrays as a space-saving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few years, suffix array construction algorithms have proliferated in bewildering abundance. This survey paper attempts to provide simple high-level descriptions of these numerous algorithms that highlight both their distinctive features and their commonalities, while avoiding as much as possible the complexities of implementation details. New hybrid algorithms are also described. We provide comparisons of the algorithms ’ worst-case time complexity and use of additional space, together with results of recent experimental test runs on many of their implementations.
Introspective Sorting and Selection Algorithms
- Software Practice and Experience
, 1997
"... Quicksort is the preferred in-place sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worst-case time bound is \Theta(N ). Previo ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Quicksort is the preferred in-place sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worst-case time bound is \Theta(N ). Previous attempts to protect against the worst case by improving the way quicksort chooses pivot elements for partitioning have increased the average computing time too much---one might as well use heapsort, which has a \Theta(N log N) worst-case time bound but is on the average 2 to 5 times slower than quicksort. A similar dilemma exists with selection algorithms (for finding the i-th largest element) based on partitioning. This paper describes a simple solution to this dilemma: limit the depth of partitioning, and for subproblems that exceed the limit switch to another algorithm with a better worst-case bound. Using heapsort as the "stopper" yields a sorting algorithm that is just as fast as quicksort in the average case but also has an \Theta(N log N) worst case time bound. For selection, a hybrid of Hoare's find algorithm, which is linear on average but quadratic in the worst case, and the Blum-Floyd-Pratt-Rivest-Tarjan algorithm is as fast as Hoare's algorithm in practice, yet has a linear worst-case time bound. Also discussed are issues of implementing the new algorithms as generic algorithms and accurately measuring their performance in the framework of the C++ Standard Template Library.
Optimal Sampling Strategies in Quicksort and Quickselect
- PROC. OF THE 25TH INTERNATIONAL COLLOQUIUM (ICALP-98), VOLUME 1443 OF LNCS
, 1998
"... It is well known that the performance of quicksort can be substantially improved by selecting the median of a sample of three elements as the pivot of each partitioning stage. This variant is easily generalized to samples of size s = 2k + 1. For large samples the partitions are better as the median ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
It is well known that the performance of quicksort can be substantially improved by selecting the median of a sample of three elements as the pivot of each partitioning stage. This variant is easily generalized to samples of size s = 2k + 1. For large samples the partitions are better as the median of the sample makes a more accurate estimate of the median of the array to be sorted, but the amount of additional comparisons and exchanges to find the median of the sample also increases. We show that the optimal sample size to minimize the average total cost of quicksort (which includes both comparisons and exchanges) is s = a \Delta p n + o( p n ). We also give a closed expression for the constant factor a, which depends on the median-finding algorithm and the costs of elementary comparisons and exchanges. The result above holds in most situations, unless the cost of an exchange exceeds by far the cost of a comparison. In that particular case, it is better to select not the median of...
Engineering a cache-oblivious sorting algorithm
- In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cache-oblivious model of computation is a two-level memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multi-level memory mod ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The cache-oblivious model of computation is a two-level memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multi-level memory model. Since the introduction of the cache-oblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cache-oblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cache-oblivious sorting. We investigate a number of implementation issues and parameters choices for the cache-oblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cache-aware proposals. The main result is a carefully implemented cache-oblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1

