Results 1  10
of
39
Fast Algorithms for Sorting and Searching Strings
, 1997
"... We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching a ..."
Abstract

Cited by 147 (0 self)
 Add to MetaCart
We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching algorithm blends tries and binary search trees; it is faster than hashing and other commonly used search methods. The basic ideas behind the algorithms date back at least to the 1960s, but their practical utility has been overlooked. We also present extensions to more complex string problems, such as partialmatch searching. 1. Introduction Section 2 briefly reviews Hoare's [9] Quicksort and binary search trees. We emphasize a wellknown isomorphism relating the two, and summarize other basic facts. The multikey algorithms and data structures are presented in Section 3. Multikey Quicksort orders a set of n vectors with k components each. Like regular Quicksort, it partitions its input into...
IPAddress Lookup Using LCTries
, 1998
"... There has recently been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multiGb/s routers for the Internet, without having to rely on multilayer switching techniques. We address this ..."
Abstract

Cited by 102 (0 self)
 Add to MetaCart
There has recently been a notable interest in the organization of routing information to enable fast lookup of IP addresses. The interest is primarily motivated by the goal of building multiGb/s routers for the Internet, without having to rely on multilayer switching techniques. We address this problem by using an LCtrie, a trie structure with combined path and level compression. This data structure enables us to build efficient, compact and easily searchable implementations of an IP routing table. The structure can store both unicast and multicast addresses with the same average search times. The search depth increases as \Theta (log log n) with the number of entries in the table for a large class of distributions and it is independent of the length of the addresses. A node in the trie can be coded with four bytes. Only the size of the base vector, which contains the search strings, grows linearly with the length of the addresses when extended from 4 to 16 bytes, as mandated by the shift from IP version 4 to version 6. We present the basic structure, as well as an adaptive version that roughly doubles the number of lookups per second. More general classifications of packets that are needed for link sharing, quality of service provisioning and for multicast and multipath routing are also discussed. Our experimental results compare favorably with those reported previously in the research literature.
Engineering a lightweight suffix array construction algorithm (Extended Abstract)
"... In this paper we consider the problem of computing the suffix array of a text T [1, n]. This problem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matchi ..."
Abstract

Cited by 59 (4 self)
 Add to MetaCart
In this paper we consider the problem of computing the suffix array of a text T [1, n]. This problem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matching problems involving both linguistic texts and biological data [4, 11]. Recently, the interest in this data structure has been revitalized by its use as a building block for three novel applications: (1) the BurrowsWheeler compression algorithm [3], which is a provably [17] and practically [20] effective compression tool; (2) the construction of succinct [10, 19] and compressed [7, 8] indexes; the latter can store both the input text and its fulltext index using roughly the same space used by traditional compressors for the text alone; and (3) algorithms for clustering and ranking the answers to user queries in websearch engines [22]. In all these applications the construction of the suffix array is the computational bottleneck both in time and space. This motivated our interest in designing yet another suffix array construction algorithm which is fast and "lightweight" in the sense that it uses small space...
Second Phase Changes in Random MAry Search Trees and Generalized Quicksort: Convergence Rates
, 2002
"... We study the convergence rate to normal limit law for the space requirement of random mary search trees. While it is known that the random variable is asymptotically normally distributed for 3 m 26 and that the limit law does not exist for m ? 26, we show that the convergence rate is O(n ) for ..."
Abstract

Cited by 46 (12 self)
 Add to MetaCart
We study the convergence rate to normal limit law for the space requirement of random mary search trees. While it is known that the random variable is asymptotically normally distributed for 3 m 26 and that the limit law does not exist for m ? 26, we show that the convergence rate is O(n ) for 3 m 19 and is O(n ), where 4=3 ! ff ! 3=2 is a parameter depending on m for 20 m 26. Our approach is based on a refinement to the method of moments and applicable to other recursive random variables; we briefly mention the applications to quicksort proper and the generalized quicksort of Hennequin, where more phase changes are given. These results provide natural, concrete examples for which the BerryEsseen bounds are not necessarily proportional to the reciprocal of the standard deviation. Local limit theorems are also derived. Abbreviated title. Phase changes in search trees.
Phase Change of Limit Laws in the Quicksort Recurrence Under Varying Toll Functions
, 2001
"... We characterize all limit laws of the quicksort type random variables defined recursively by Xn = X In + X # n1In + Tn when the "toll function" Tn varies and satisfies general conditions, where (Xn ), (X # n ), (I n , Tn ) are independent, Xn . . . , n 1}. When the "toll function" Tn ..."
Abstract

Cited by 46 (19 self)
 Add to MetaCart
We characterize all limit laws of the quicksort type random variables defined recursively by Xn = X In + X # n1In + Tn when the "toll function" Tn varies and satisfies general conditions, where (Xn ), (X # n ), (I n , Tn ) are independent, Xn . . . , n 1}. When the "toll function" Tn (cost needed to partition the original problem into smaller subproblems) is small (roughly lim sup n## log E(Tn )/ log n 1/2), Xn is asymptotically normally distributed; nonnormal limit laws emerge when Tn becomes larger. We give many new examples ranging from the number of exchanges in quicksort to sorting on broadcast communication model, from an insitu permutation algorithm to tree traversal algorithms, etc.
FASTER SUFFIX SORTING
, 1999
"... We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our
A taxonomy of suffix array construction algorithms
 ACM Computing Surveys
, 2007
"... In 1990, Manber and Myers proposed suffix arrays as a spacesaving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few years, suffix array construction algorithms have proliferated in bewildering abunda ..."
Abstract

Cited by 39 (10 self)
 Add to MetaCart
In 1990, Manber and Myers proposed suffix arrays as a spacesaving alternative to suffix trees and described the first algorithms for suffix array construction and use. Since that time, and especially in the last few years, suffix array construction algorithms have proliferated in bewildering abundance. This survey paper attempts to provide simple highlevel descriptions of these numerous algorithms that highlight both their distinctive features and their commonalities, while avoiding as much as possible the complexities of implementation details. New hybrid algorithms are also described. We provide comparisons of the algorithms ’ worstcase time complexity and use of additional space, together with results of recent experimental test runs on many of their implementations.
Introspective Sorting and Selection Algorithms
 Software Practice and Experience
, 1997
"... Quicksort is the preferred inplace sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worstcase time bound is \Theta(N ). Previo ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
Quicksort is the preferred inplace sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worstcase time bound is \Theta(N ). Previous attempts to protect against the worst case by improving the way quicksort chooses pivot elements for partitioning have increased the average computing time too muchone might as well use heapsort, which has a \Theta(N log N) worstcase time bound but is on the average 2 to 5 times slower than quicksort. A similar dilemma exists with selection algorithms (for finding the ith largest element) based on partitioning. This paper describes a simple solution to this dilemma: limit the depth of partitioning, and for subproblems that exceed the limit switch to another algorithm with a better worstcase bound. Using heapsort as the "stopper" yields a sorting algorithm that is just as fast as quicksort in the average case but also has an \Theta(N log N) worst case time bound. For selection, a hybrid of Hoare's find algorithm, which is linear on average but quadratic in the worst case, and the BlumFloydPrattRivestTarjan algorithm is as fast as Hoare's algorithm in practice, yet has a linear worstcase time bound. Also discussed are issues of implementing the new algorithms as generic algorithms and accurately measuring their performance in the framework of the C++ Standard Template Library.
Optimal Sampling Strategies in Quicksort and Quickselect
 PROC. OF THE 25TH INTERNATIONAL COLLOQUIUM (ICALP98), VOLUME 1443 OF LNCS
, 1998
"... It is well known that the performance of quicksort can be substantially improved by selecting the median of a sample of three elements as the pivot of each partitioning stage. This variant is easily generalized to samples of size s = 2k + 1. For large samples the partitions are better as the median ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
It is well known that the performance of quicksort can be substantially improved by selecting the median of a sample of three elements as the pivot of each partitioning stage. This variant is easily generalized to samples of size s = 2k + 1. For large samples the partitions are better as the median of the sample makes a more accurate estimate of the median of the array to be sorted, but the amount of additional comparisons and exchanges to find the median of the sample also increases. We show that the optimal sample size to minimize the average total cost of quicksort (which includes both comparisons and exchanges) is s = a \Delta p n + o( p n ). We also give a closed expression for the constant factor a, which depends on the medianfinding algorithm and the costs of elementary comparisons and exchanges. The result above holds in most situations, unless the cost of an exchange exceeds by far the cost of a comparison. In that particular case, it is better to select not the median of...
Engineering a cacheoblivious sorting algorithm
 In Proc. 6th Workshop on Algorithm Engineering and Experiments
, 2004
"... The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory mod ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
The cacheoblivious model of computation is a twolevel memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multilevel memory model. Since the introduction of the cacheoblivious model by Frigo et al. in 1999, a number of algorithms and data structures in the model has been proposed and analyzed. However, less attention has been given to whether the nice theoretical proporities of cacheoblivious algorithms carry over into practice. This paper is an algorithmic engineering study of cacheoblivious sorting. We investigate a number of implementation issues and parameters choices for the cacheoblivious sorting algorithm Lazy Funnelsort by empirical methods, and compare the final algorithm with Quicksort, the established standard for comparison based sorting, as well as with recent cacheaware proposals. The main result is a carefully implemented cacheoblivious sorting algorithm, which we compare to the best implementation of Quicksort we can find, and find that it competes very well for input residing in RAM, and outperforms Quicksort for input on disk. 1