Results 1  10
of
35
Fast Algorithms for Sorting and Searching Strings
, 1997
"... We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching a ..."
Abstract

Cited by 148 (0 self)
 Add to MetaCart
We present theoretical algorithms for sorting and searching multikey data, and derive from them practical C implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and radix sort; it is competitive with the best known C sort codes. The searching algorithm blends tries and binary search trees; it is faster than hashing and other commonly used search methods. The basic ideas behind the algorithms date back at least to the 1960s, but their practical utility has been overlooked. We also present extensions to more complex string problems, such as partialmatch searching. 1. Introduction Section 2 briefly reviews Hoare's [9] Quicksort and binary search trees. We emphasize a wellknown isomorphism relating the two, and summarize other basic facts. The multikey algorithms and data structures are presented in Section 3. Multikey Quicksort orders a set of n vectors with k components each. Like regular Quicksort, it partitions its input into...
Dynamical Sources in Information Theory: A General Analysis of Trie Structures
 ALGORITHMICA
, 1999
"... Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of arraytries, list tries, and bsttries ("ternary search tries"). The size and the sear ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
Digital trees, also known as tries, are a general purpose flexible data structure that implements dictionaries built on sets of words. An analysis is given of three major representations of tries in the form of arraytries, list tries, and bsttries ("ternary search tries"). The size and the search costs of the corresponding representations are analysed precisely in the average case, while a complete distributional analysis of height of tries is given. The unifying data model used is that of dynamical sources and it encompasses classical models like those of memoryless sources with independent symbols, of finite Markovchains, and of nonuniform densities. The probabilistic behaviour of the main parameters, namely size, path length, or height, appears to be determined by two intrinsic characteristics of the source: the entropy and the probability of letter coincidence. These characteristics are themselves related in a natural way to spectral properties of specific transfer operators of the Ruelle type.
Fast Text Searching for Regular Expressions or Automaton Searching on Tries
"... We present algorithms for efficient searching of regular expressions on preprocessed text, using a Patricia tree as a logical model for the index. We obtain searching algorithms that run in logarithmic expected time in the size of the text for a wide subclass of regular expressions, and in subline ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
We present algorithms for efficient searching of regular expressions on preprocessed text, using a Patricia tree as a logical model for the index. We obtain searching algorithms that run in logarithmic expected time in the size of the text for a wide subclass of regular expressions, and in sublinear expected time for any regular expression. This is the first such algorithm to be found with this complexity.
Analytic Variations on QuadTrees
, 1991
"... Quadtrees constitute a hierarchical data structure which permits fast access to multidimensional data. This paper presents the analysis of the expected cost of various types of searches in quadtreesfully specified and partial match queries. The data model assumes random points with independently ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
Quadtrees constitute a hierarchical data structure which permits fast access to multidimensional data. This paper presents the analysis of the expected cost of various types of searches in quadtreesfully specified and partial match queries. The data model assumes random points with independently drawn coordinate values. The analysis leads to a class of "fullhistory" divideandconquer recurrences. These recurrences are solved using generating functions, either exactly for dimension d = 2, or asymptotically for higher dimensions. The exact solutions involve hypergeometric functions. The general asymptotic solutions relie on the classification of singularities of linear differential equations with analytic coefficients, and on singularity analysis techniques. These methods are applicable to the asymptotic solution of a wide range of linear recurrences, as may occur in particular in the analysis of multidimensional searching problems.
Burst Tries: A Fast, Efficient Data Structure for String Keys
 ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or nearsorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
Singularity Analysis, Hadamard Products, and Tree Recurrences
, 2003
"... We present a toolbox for extracting asymptotic information on the coecients of combinatorial generating functions. This toolbox notably includes a treatment of the eect of Hadamard products on singularities in the context of the complex Tauberian technique known as singularity analysis. As a consequ ..."
Abstract

Cited by 28 (9 self)
 Add to MetaCart
We present a toolbox for extracting asymptotic information on the coecients of combinatorial generating functions. This toolbox notably includes a treatment of the eect of Hadamard products on singularities in the context of the complex Tauberian technique known as singularity analysis. As a consequence, it becomes possible to unify the analysis of a number of divideandconquer algorithms, or equivalently random tree models, including several classical methods for sorting, searching, and dynamically managing equivalence relations.
On a multivariate contraction method for random recursive structures with applications to Quicksort
, 2001
"... The contraction method for recursive algorithms is extended to the multivariate analysis of vectors of parameters of recursive structures and algorithms. We prove a general multivariate limit law which also leads to an approach to asymptotic covariances and correlations of the parameters. As an appl ..."
Abstract

Cited by 28 (15 self)
 Add to MetaCart
The contraction method for recursive algorithms is extended to the multivariate analysis of vectors of parameters of recursive structures and algorithms. We prove a general multivariate limit law which also leads to an approach to asymptotic covariances and correlations of the parameters. As an application the asymptotic correlations and a bivariate limit law for the number of key comparisons and exchanges of medianof(2t + 1) Quicksort is given. Moreover, for the Quicksort programs analyzed by Sedgewick the exact order of the standard deviation and a limit law follow, considering all the parameters counted by Sedgewick.
The Analysis of Hybrid Trie Structures
, 1998
"... This paper provides a detailed analysis of various implementations of digital tries, including the “ternary search tries” of Bentley and Sedgewick. The methods employed combine symbolic uses of generating functions, Poisson models, and MeIlin transforms. Theoretical results are matched against real ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
This paper provides a detailed analysis of various implementations of digital tries, including the “ternary search tries” of Bentley and Sedgewick. The methods employed combine symbolic uses of generating functions, Poisson models, and MeIlin transforms. Theoretical results are matched against reallife data and justify the claim that ternary search tries are a highly efficient dynamic dictionary structure for strings and textual data.
Search costs in quadtrees and singularity perturbation asymptotics
 Discrete Comput. Geom
, 1994
"... Abstract. Quadtrees constitute a classical data structure for storing and accessing collections of points in multidimensional space. It is proved that, in any dimension, the cost of a random search in a randomly grown quadtree has logarithmic mean and variance and is asymptotically distributed as a ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
Abstract. Quadtrees constitute a classical data structure for storing and accessing collections of points in multidimensional space. It is proved that, in any dimension, the cost of a random search in a randomly grown quadtree has logarithmic mean and variance and is asymptotically distributed as a normal variable. The limit distribution property extends to quadtrees of all dimensions a result only known so far to hold for binary search trees. The analysis is based on a technique of singularity perturbation that appears to be of some generality. For quadtrees, this technique is applied to linear differential equations satisfied by intervening bivariate generating functions 1.
An asymptotic theory for CauchyEuler differential equations with applications to the analysis of algorithms
, 2002
"... CauchyEuler differential equations surfaced naturally in a number of sorting and searching problems, notably in quicksort and binary search trees and their variations. Asymptotics of coefficients of functions satisfying such equations has been studied for several special cases in the literature. We ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
CauchyEuler differential equations surfaced naturally in a number of sorting and searching problems, notably in quicksort and binary search trees and their variations. Asymptotics of coefficients of functions satisfying such equations has been studied for several special cases in the literature. We study in this paper the most general framework for CauchyEuler equations and propose an asymptotic theory that covers almost all applications where CauchyEuler equations appear. Our approach is very general and requires almost no background on differential equations. Indeed the whole theory can be stated in terms of recurrences instead of functions. Old and new applications of the theory are given. New phase changes of limit laws of new variations of quicksort are systematically derived. We apply our theory to about a dozen of diverse examples in quicksort, binary search trees, urn models, increasing trees, etc.