Fast Algorithms for Sorting and Searching Strings
, 1997
"... We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching a ..."
Abstract

Cited by 148 (0 self)
We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching algorithm blends tries and binary search trees; it is faster than hashing and other commonly used search methods. The basic ideas behind the algorithms date back at least to the 1960s, but their practical utility has been overlooked. We also present extensions to more complex string problems, such as partialmatch searching. 1. Introduction Section 2 briefly reviews Hoare's [9] Quicksort and binary search trees. We emphasize a wellknown isomorphism relating the two, and summarize other basic facts. The multikey algorithms and data structures are presented in Section 3. Multikey Quicksort orders a set of n vectors with k components each. Like regular Quicksort, it partitions its input into...
A Survey of Adaptive Sorting Algorithms
, 1992
"... Introduction and Survey; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems  Sorting and Searching; E.5 [Data]: Files  Sorting/searching; G.3 [Mathematics of Computing]: Probability and Statistics  Probabilistic algorithms; E.2 [Data Storage Represe ..."
Abstract

Cited by 65 (3 self)
Introduction and Survey; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems  Sorting and Searching; E.5 [Data]: Files  Sorting/searching; G.3 [Mathematics of Computing]: Probability and Statistics  Probabilistic algorithms; E.2 [Data Storage Representation]: Composite structures, linked representations. General Terms: Algorithms, Theory. Additional Key Words and Phrases: Adaptive sorting algorithms, Comparison trees, Measures of disorder, Nearly sorted sequences, Randomized algorithms. A Survey of Adaptive Sorting Algorithms 2 CONTENTS INTRODUCTION I.1 Optimal adaptivity I.2 Measures of disorder I.3 Organization of the paper 1.WORSTCASE ADAPTIVE (INTERNAL) SORTING ALGORITHMS 1.1 Generic Sort 1.2 CookKim division 1.3 Partition Sort 1.4 Exponential Search 1.5 Adaptive Merging 2.EXPECTEDCASE ADAPTIV
An asymptotic theory for CauchyEuler differential equations with applications to the analysis of algorithms
, 2002
"... CauchyEuler differential equations surfaced naturally in a number of sorting and searching problems, notably in quicksort and binary search trees and their variations. Asymptotics of coefficients of functions satisfying such equations has been studied for several special cases in the literature. We ..."
Abstract

Cited by 22 (10 self)
CauchyEuler differential equations surfaced naturally in a number of sorting and searching problems, notably in quicksort and binary search trees and their variations. Asymptotics of coefficients of functions satisfying such equations has been studied for several special cases in the literature. We study in this paper the most general framework for CauchyEuler equations and propose an asymptotic theory that covers almost all applications where CauchyEuler equations appear. Our approach is very general and requires almost no background on differential equations. Indeed the whole theory can be stated in terms of recurrences instead of functions. Old and new applications of the theory are given. New phase changes of limit laws of new variations of quicksort are systematically derived. We apply our theory to about a dozen of diverse examples in quicksort, binary search trees, urn models, increasing trees, etc.
Grouping and Duplicate Elimination: Benefits of Early Aggregation
, 1997
"... Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation funct ..."
Abstract

Cited by 11 (1 self)
Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation functions. This paper describes six different algorithms for grouping and aggregation, shows how to incorporate early aggregation in each of them, and analyzes the resulting reduction in intermediate data. In addition to the grouping algorithm used, the reduction depends on several factors: the number of groups, the skew in group size distribution, the input size, and the amount of main memory available. All six algorithms considered benefit from early aggregation with grouping by hash partitioning producing the least amount of intermediate data. If the group size distribution is skewed, the overall reduction can be very significant, even with a modest amount of additional main memory.
Data Reduction Through Early Grouping
 In Proceedings of the 1994 IBM CAS Conference
, 1994
"... SQL queries containing GROUPBY and aggregation occur frequently in decision support applications. Grouping with aggregation is typically done by first sorting the input and then performing the aggregation as part of the output phase of the sort. The most widely used external sorting algorithm is mer ..."
Abstract

Cited by 3 (0 self)
SQL queries containing GROUPBY and aggregation occur frequently in decision support applications. Grouping with aggregation is typically done by first sorting the input and then performing the aggregation as part of the output phase of the sort. The most widely used external sorting algorithm is merge sort, consisting of a run formation phase followed by a (single) merge pass. The amount of data output from the run formation phase can be reduced by a technique that we call early grouping. The idea is straightforward: simply form groups and perform aggregation during run formation. Each run will now consist of partial groups instead of individual records. These partial groups are then combined during the merge phase. Early grouping always reduces the number of records output from the run formation phase. The relative output size depends on the amount of memory relative to the total number of groups and the distribution of records over groups. When the input data is uniformly distributed...
Duplicate Detection and Deletion in the Extended NF² Data Model
 IN PROC. OF THE 1989 INTL. CONF. ON FOUNDATIONS OF DATA ORGANIZATION AND ALGORITHMS
, 1989
"... A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF ) data model. One particular development, the socalled extended NF data model, even permits structured values like lists and tuples to be included as attributes in rel ..."
Abstract
A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF ) data model. One particular development, the socalled extended NF data model, even permits structured values like lists and tuples to be included as attributes in relations. It is thus well suited to represent complex objects for nonstandard database applications. A DBMS which uses this model, called the Advanced Information Management Prototype, is currently being implemented at the IBM Heidelberg Scientific Center. In this paper we examine the problem of detecting and deleting duplicates within this data model. Several alternative approaches are evaluated and a new method, based on sorting complex objects, is proposed, which is both time and spaceefficient.
Partitioning schemes for quicksort and quickselect
, 2003
"... We introduce several modifications of the partitioning schemes used in Hoare’s quicksort and quickselect algorithms, including ternary schemes which identify keys less or greater than the pivot. We give estimates for the numbers of swaps made by each scheme. Our computational experiments indicate th ..."
Abstract
We introduce several modifications of the partitioning schemes used in Hoare’s quicksort and quickselect algorithms, including ternary schemes which identify keys less or greater than the pivot. We give estimates for the numbers of swaps made by each scheme. Our computational experiments indicate that ternary schemes allow quickselect to identify all keys equal to the selected key at little additional cost. Key words. Sorting, selection, quicksort, quickselect, partitioning. 1
Pbit and other algorithms sorting lists.
, 2008
"... Pbit besides its simplicity it’s definitely the fastest algorithm sorting lists. It considerably surpasses all already known methods. Among many advantages, it is stable, linear and nonextensive. I will compare Pbit with algorithm described by Donald E. Knuth [4] in the third volume of “The Art of ..."
Abstract
Pbit besides its simplicity it’s definitely the fastest algorithm sorting lists. It considerably surpasses all already known methods. Among many advantages, it is stable, linear and nonextensive. I will compare Pbit with algorithm described by Donald E. Knuth [4] in the third volume of “The Art of Computer Programming ” and other algorithms sorting list. 1 Introduction to lists A Lists is a set of independent data formats, very often called nodes. Particular nodes are connected with each other by means of pointers. Nodes are usually created dynamically, which is good for economical use of memory. We will deal with singlylinked lists because Pbit was created for such type of lists (which