Results 1  10
of
20
Mellin transforms and asymptotics: Finite differences and Rice's integrals
, 1995
"... High order differences of simple number sequences may be analysed asymptotically by means of integral representations, residue calculus, and contour integration. This technique, akin to Mellin transform asymptotics, is put in perspective and illustrated by means of several examples related to combin ..."
Abstract

Cited by 82 (8 self)
 Add to MetaCart
High order differences of simple number sequences may be analysed asymptotically by means of integral representations, residue calculus, and contour integration. This technique, akin to Mellin transform asymptotics, is put in perspective and illustrated by means of several examples related to combinatorics and the analysis of algorithms like digital tries, digital search trees, quadtrees, and distributed leader election.
A Generalized Suffix Tree and Its (Un)Expected Asymptotic Behaviors
 SIAM J. Computing
, 1996
"... Suffix trees find several applications in computer science and telecommunications, most notably in algorithms on strings, data compressions and codes. Despite this, very little is known about their typical behaviors. In a probabilistic framework, we consider a family of suffix trees  further calle ..."
Abstract

Cited by 52 (29 self)
 Add to MetaCart
Suffix trees find several applications in computer science and telecommunications, most notably in algorithms on strings, data compressions and codes. Despite this, very little is known about their typical behaviors. In a probabilistic framework, we consider a family of suffix trees  further called bsuffix trees  built from the first n suffixes of a random word. In this family a noncompact suffix tree (i.e., such that every edge is labeled by a single symbol) is represented by b = 1, and a compact suffix tree (i.e., without unary nodes) is asymptotically equivalent to b ! 1 as n ! 1. We study several parameters of bsuffix trees, namely: the depth of a given suffix, the depth of insertion, the height and the shortest feasible path. Some new results concerning typical (i.e., almost sure) behaviors of these parameters are established. These findings are used to obtain several insights into certain algorithms on words, molecular biology and universal data compression schemes. Key Wo...
HighPerformance IP Routing Table Lookup Using CPU Caching
"... Wirespeed IP (Internet Protocol) routers require very fast routing table lookup for incoming IP packets. The routing table lookup operation is time consuming because the part of an IP address used in the lookup, i.e., the network address portion, is variable in length. This paper describes the rout ..."
Abstract

Cited by 46 (7 self)
 Add to MetaCart
Wirespeed IP (Internet Protocol) routers require very fast routing table lookup for incoming IP packets. The routing table lookup operation is time consuming because the part of an IP address used in the lookup, i.e., the network address portion, is variable in length. This paper describes the routing table lookup algorithm used in a clusterbased parallel IP router project called Suez. The innovative aspect of this algorithm is its ability to use CPU caching hardware to perform routing table caching and lookup directly by carefully mapping IP addresses to virtual addresses. By running a detailed simulation model that incorporates the performance effects of the CPU memory hierarchy against a packet trace collected from a major network router, we show that the overall performance of the proposed algorithm can reach 87.87 million lookups per second for a 500MHz Alpha processor with a 16KByte L1 cache and a 1MByte L2 cache. This result is one to two orders of magnitude faster than pre...
Burst Tries: A Fast, Efficient Data Structure for String Keys
 ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or nearsorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
SelfAlignment in Words and their Applications
 J. Algorithms
, 1992
"... Some quantities associated with periodicities in words are analyzed within the Bernoulli probabilistic model. In particular, the following problem is addressed. Assume that a string X is given, with symbols emitted randomly but independently according to some known distribution of probabilities. T ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
Some quantities associated with periodicities in words are analyzed within the Bernoulli probabilistic model. In particular, the following problem is addressed. Assume that a string X is given, with symbols emitted randomly but independently according to some known distribution of probabilities. Then, for each pair (W , Z) of distinct suffixes of X, the expected length of the longest common prefix of W and Z is sought. The collection of these lengths, that are called here selfalignments, plays a crucial role in several algorithmic problems on words, such as building suffix trees or inverted files, detecting squares and other regularities, computing substring statistics, etc. The asymptotically best algorithms for these problems are quite complex and thus risk to be unpractical. The present analysis of selfalignments and related measures suggests that, in a variety of cases, more straightforward algorithmic solutions may yield comparable or even better performances. Key words and ph...
Size and Path length of Patricia Tries: Dynamical Sources Context.
, 2001
"... Digital trees, also known as tries, and Patricia tries are flexible data structures that occur in a variety of computer and communication algorithms including dynamic hashing, partial match retrieval, searching and sorting, conflict resolution algorithms for broadcast communication, data compression ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Digital trees, also known as tries, and Patricia tries are flexible data structures that occur in a variety of computer and communication algorithms including dynamic hashing, partial match retrieval, searching and sorting, conflict resolution algorithms for broadcast communication, data compression, and so forth. We consider here tries and Patricia tries built from $n$ words emitted by a probabilistic dynamical source. Such sources encompass classical and many more models of sources as memoryless sources and finite Markov chains. The probabilistic behavior of the main parameters, namely the size and path length, appears to be determined by some intrinsic characteristics of the source, namely the entropy and two other constants, themselves related in a natural way to spectral properties of specific transfer operators of Ruelle type. Keywords: Averagecase Analysis of datastructures, Information Theory, Trie, Mellin analysis, Dynamical systems, Ruelle operator, Functional Analysis.
Summary Structures for Frequency Queries on Large Transaction Sets
 In Data Compression Conference
, 2000
"... As largescale databases become commonplace, there has been significant interest in mining them for commercial purposes. One of the basic tasks that underlies many of these mining operations is querying of transaction sets for frequencies of specified attribute values. The size of these databases ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
As largescale databases become commonplace, there has been significant interest in mining them for commercial purposes. One of the basic tasks that underlies many of these mining operations is querying of transaction sets for frequencies of specified attribute values. The size of these databases makes it important to develop summary structures capable of high compression ratios as well as supporting fast frequency queries. The nature of the problem and its differences with respect to traditional text compression allows very high compression ratios. In this paper, we propose a binary triebased summary structure for representing transaction sets. We demonstrate that this trie structure, when augmented with an appropriate set of horizontal pointers, can support frequency queries several orders of magnitude faster than raw transaction data. We improve the memory characteristics of our scheme by compressing the trie into a Patricia trie and demonstrate that this does not have...
Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution Of Depth
 Journal of the Iranian Statistical Society
, 1993
"... Sux trees are the most frequently used data structure in algorithms on words. Despite this, little is known about their behavior in a probabilistic framework. In this paper, we consider the depth of a compact sux tree, also known as the PAT tree, under some simple probabilistic assumptions. In fact, ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Sux trees are the most frequently used data structure in algorithms on words. Despite this, little is known about their behavior in a probabilistic framework. In this paper, we consider the depth of a compact sux tree, also known as the PAT tree, under some simple probabilistic assumptions. In fact, for the case of an asymmetric alphabet, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even though the PATRICIA trie is constructed over statistically independent strings. In other words, the limiting distribution for the depth in a PAT tree storing n suxes is normal. This research was primary supported by NATO Collaborative Grant 0057/89. y This research was in part supported by AFOSR grant 900107 and by NSF grant CCR8900305. z This author's research was supported in part by NATO Collaborative grant 00570/89, and in part by AFOSR grant 900107, by NSF grants CCR9201078 and NCR9206315...