Results 1 - 10
of
49
Fast Algorithms for Sorting and Searching Strings
, 1997
"... We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching a ..."
Abstract
-
Cited by 131 (0 self)
- Add to MetaCart
We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching algorithm blends tries and binary search trees; it is faster than hashing and other commonly used search methods. The basic ideas behind the algorithms date back at least to the 1960s, but their practical utility has been overlooked. We also present extensions to more complex string problems, such as partial-match searching. 1. Introduction Section 2 briefly reviews Hoare's [9] Quicksort and binary search trees. We emphasize a well-known isomorphism relating the two, and summarize other basic facts. The multikey algorithms and data structures are presented in Section 3. Multikey Quicksort orders a set of n vectors with k components each. Like regular Quicksort, it partitions its input into...
The influence of caches on the performance of sorting
- IN PROCEEDINGS OF THE SEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1997
"... We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all t ..."
Abstract
-
Cited by 104 (3 self)
- Add to MetaCart
We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all three algorithms the improvementincache performance leads to a reduction in total execution time. We also investigate the performance of radix sort. Despite the extremely low instruction count incurred by this linear time sorting algorithm, its relatively poor cache performance results in worse overall performance than the e cient comparison based sorting algorithms. For each algorithm we provide an analysis that closely predicts the number of cache misses incurred by the algorithm.
A Limit Theorem for "Quicksort"
- Applications/Theoretical Informatics and Applications
, 1999
"... Let X n be the number of comparisons needed by the sorting algorithm Quicksort to sort a list of n numbers into their natural ordering. We show that (X n \Gamma E(X n ))=n converges weakly to some random variable Y. The distribution of Y is characterized as the fixed point of some contraction. It sa ..."
Abstract
-
Cited by 82 (2 self)
- Add to MetaCart
Let X n be the number of comparisons needed by the sorting algorithm Quicksort to sort a list of n numbers into their natural ordering. We show that (X n \Gamma E(X n ))=n converges weakly to some random variable Y. The distribution of Y is characterized as the fixed point of some contraction. It satisfies a recursive equation, which is used to provide recursive relations for the moments. The random variable Y has exponential tails. Therefore the probability that Quicksort performs badly, e.g. that X n is larger than 2E(X n ) converges polynomially fast of every order to zero. R'esum'e Soit X n le nombre de comparaisons utilis'ees par la proc'edure Quicksort pour trier une liste de nombres distincts. Nous d'emontrons que (X n \Gamma E(X n ))=n converge faiblement vers une certaine variable al'eatoire Y. La distribution de Y est le point fixe d'une contraction et peut etre calcul'ee num'eriquement par it'eration. Keywords: sorting algorithm quicksort, fixed point, asymptotic distribut...
On the Analysis of Stochastic Divide and Conquer Algorithms.
, 1999
"... This paper develops general tools for the analysis of stochastic divide and conquer algorithms. We concentrate on the average performance and the distribution of the duration of the algorithm. In particular we analyse the average performance and the running time distribution of the 2k + 1-median ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
This paper develops general tools for the analysis of stochastic divide and conquer algorithms. We concentrate on the average performance and the distribution of the duration of the algorithm. In particular we analyse the average performance and the running time distribution of the 2k + 1-median version of Quicksort.
General Method of Program Code Obfuscation
, 2002
"... Obfuscation can be a simple tool for soft- ware protection. In this paper we present a method of machine code obfuscation, which can be applied to most present processors. The obfuscation method is based on a theory, which led to two useful theorems. The proposed algorithm of obfuscation was impleme ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Obfuscation can be a simple tool for soft- ware protection. In this paper we present a method of machine code obfuscation, which can be applied to most present processors. The obfuscation method is based on a theory, which led to two useful theorems. The proposed algorithm of obfuscation was implemented and tested using analytical and empirical approaches. The obtained results give the first estimation of the maximum possible eciency of the obfuscation process.
A Dynamically Tuned Sorting Library
, 2004
"... Empirical search is a strategy used during the installation of library generators such as ATLAS, FFTW, and SPIRAL to identify the algorithm or the version of an algorithm that delivers the best performance. In the past, empirical search has been applied almost exclusively to scientific problems. In ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
Empirical search is a strategy used during the installation of library generators such as ATLAS, FFTW, and SPIRAL to identify the algorithm or the version of an algorithm that delivers the best performance. In the past, empirical search has been applied almost exclusively to scientific problems. In this paper, we discuss the application of empirical search to sorting, which is one of the best understood symbolic computing problems. When contrasted with the dense numerical computations of ATLAS, FFTW, and SPIRAL, sorting presents a new challenge, namely that the relative performance of the algorithms depend not only on the characteristics of the target machine and the size of the input data but also on the distribution of values in the input data set. Empirical search is applied in the study reported here as part of a sorting library generator. The resulting routines dynamically adapt to the characteristics of the input data by selecting the best sorting algorithm from a small set of alternatives. To generate the run time selection mechanism our generator makes use of machine learning to predict the best algorithm as a function of the characteristics of the input data set and the performance of the different algorithms on the target machine. This prediction is based on the data obtained through empirical search at installation time. Our results show that our approach is quite effective. When sorting data inputs of 12M keys with various standard deviations, our adaptive approach selected the best algorithm for all the input data sets and all platforms that we tried in our experiments. The wrong decision could have introduced a performance degradation of up to 133%, with an average value of 44%.
A Fixed Point Theorem For Distributions
- Stochastic Processes and Their Applications
, 1999
"... We study in a systematic form the contractive behavior of the map S of distributions to distributions S(F ) = L( X i T i X i + C) (C; T = (T 1 ; T 2 ; : : :)); X i are independent r.v., L(X i ) = F . Further we show higher and exponential moments of the fixed point. Applications of this structu ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
We study in a systematic form the contractive behavior of the map S of distributions to distributions S(F ) = L( X i T i X i + C) (C; T = (T 1 ; T 2 ; : : :)); X i are independent r.v., L(X i ) = F . Further we show higher and exponential moments of the fixed point. Applications of this structure are given for a) (generalized) branching processes b) the Hausdorff dimension of random Cantor sets and c) the sorting algorithm Quicksort. 1 Introduction Our main concern is the existence of fixed points for the map S of distribution functions to distribution functions defined by S(F ) D = X i T i X i + C: 1 Here D denotes equality in distribution, (C; T = (T 1 ; T 2 ; : : :)); X i ; i 2 IN; are independent random variables, X \Delta D = F: These fixed points appear in several quite different examples. They were not considered so far under this point of view. Let us start with some examples. The normal N(0; 1) distribution is a fixed point for S(F ) D = 2 \Gamma1=2 X 1 +...
Introspective Sorting and Selection Algorithms
- Software Practice and Experience
, 1997
"... Quicksort is the preferred in-place sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worst-case time bound is \Theta(N ). Previo ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Quicksort is the preferred in-place sorting algorithm in many contexts, since its average computing time on uniformly distributed inputs is \Theta(N log N) and it is in fact faster than most other sorting algorithms on most inputs. Its drawback is that its worst-case time bound is \Theta(N ). Previous attempts to protect against the worst case by improving the way quicksort chooses pivot elements for partitioning have increased the average computing time too much---one might as well use heapsort, which has a \Theta(N log N) worst-case time bound but is on the average 2 to 5 times slower than quicksort. A similar dilemma exists with selection algorithms (for finding the i-th largest element) based on partitioning. This paper describes a simple solution to this dilemma: limit the depth of partitioning, and for subproblems that exceed the limit switch to another algorithm with a better worst-case bound. Using heapsort as the "stopper" yields a sorting algorithm that is just as fast as quicksort in the average case but also has an \Theta(N log N) worst case time bound. For selection, a hybrid of Hoare's find algorithm, which is linear on average but quadratic in the worst case, and the Blum-Floyd-Pratt-Rivest-Tarjan algorithm is as fast as Hoare's algorithm in practice, yet has a linear worst-case time bound. Also discussed are issues of implementing the new algorithms as generic algorithms and accurately measuring their performance in the framework of the C++ Standard Template Library.

