Results 1  10
of
50
Index Structures for Path Expressions
, 1997
"... In recent years there has been an increased interest in managing data which does not conform to traditional data models, like the relational or object oriented model. The reasons for this nonconformance are diverse. One one hand, data may not conform to such models at the physical level: it may be ..."
Abstract

Cited by 285 (8 self)
 Add to MetaCart
In recent years there has been an increased interest in managing data which does not conform to traditional data models, like the relational or object oriented model. The reasons for this nonconformance are diverse. One one hand, data may not conform to such models at the physical level: it may be stored in data exchange formats, fetched from the Internet, or stored as structured les. One the other hand, it may not conform at the logical level: data may have missing attributes, some attributes may be of di erent types in di erent data items, there may be heterogeneous collections, or the data may be simply specified by a schema which is too complex or changes too often to be described easily as a traditional schema. The term semistructured data has been used to refer to such data. The data model proposed for this kind of data consists of an edgelabeled graph, in which nodes correspond to objects and edges to attributes or values. Figure 1 illustrates a semistructured database providing information about a city. Relational databases are traditionally queried with associative queries, retrieving tuples based on the value of some attributes. To answer such queries efciently, database management systems support indexes for translating attribute values into tuple ids (e.g. Btrees or hash tables). In objectoriented databases, path queries replace the simpler associative queries. Several data structures have been proposed for answering path queries e ciently: e.g., access support relations 14] and path indexes 4]. In the case of semistructured data, queries are even more complex, because they may contain generalized path expressions 1, 7, 8, 16]. The additional exibility is needed in order to traverse data whose structure is irregular, or partially unknown to the user.
Mellin transforms and asymptotics: Finite differences and Rice's integrals
, 1995
"... High order differences of simple number sequences may be analysed asymptotically by means of integral representations, residue calculus, and contour integration. This technique, akin to Mellin transform asymptotics, is put in perspective and illustrated by means of several examples related to combin ..."
Abstract

Cited by 82 (8 self)
 Add to MetaCart
High order differences of simple number sequences may be analysed asymptotically by means of integral representations, residue calculus, and contour integration. This technique, akin to Mellin transform asymptotics, is put in perspective and illustrated by means of several examples related to combinatorics and the analysis of algorithms like digital tries, digital search trees, quadtrees, and distributed leader election.
Asymptotic Behavior of the LempelZiv Parsing Scheme and Digital Search Trees
 Theoretical Computer Science
, 1995
"... The LempelZiv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phrase. The parameter of interest is the n ..."
Abstract

Cited by 64 (30 self)
 Add to MetaCart
The LempelZiv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phrase. The parameter of interest is the number M n of phrases that one can construct from a sequence of length n. In this paper, for the memoryless source with unequal probabilities of symbols generation we derive the limiting distribution of M n which turns out to be normal. This proves a long standing open problem. In fact, to obtain this result we solved another open problem, namely, that of establishing the limiting distribution of the internal path length in a digital search tree. The latter is a consequence of an asymptotic solution of a multiplicative differentialfunctional equation often arising in the analysis of algorithms on words. Interestingly enough, our findings are proved by a combination of probabilistic techniques such as renewal equation and uniform integrability, and analytical techniques such as Mellin transform, differentialfunctional equations, dePoissonization, and so forth. In concluding remarks we indicate a possibility of extending our results to Markovian models.
Universal Limit Laws for Depths in Random Trees
 SIAM Journal on Computing
, 1998
"... Random binary search trees, bary search trees, medianof(2k+1) trees, quadtrees, simplex trees, tries, and digital search trees are special cases of random split trees. For these trees, we o#er a universal law of large numbers and a limit law for the depth of the last inserted point, as well as a ..."
Abstract

Cited by 50 (8 self)
 Add to MetaCart
Random binary search trees, bary search trees, medianof(2k+1) trees, quadtrees, simplex trees, tries, and digital search trees are special cases of random split trees. For these trees, we o#er a universal law of large numbers and a limit law for the depth of the last inserted point, as well as a law of large numbers for the height.
Improved Behaviour of Tries by Adaptive Branching
"... We introduce and analyze a method to reduce the search cost in tries. Traditional trie structures use branching factors at the nodes that are either fixed or a function of the number of elements. Instead, we let the distribution of the elements guide the choice of branching factors. This is accomp ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
We introduce and analyze a method to reduce the search cost in tries. Traditional trie structures use branching factors at the nodes that are either fixed or a function of the number of elements. Instead, we let the distribution of the elements guide the choice of branching factors. This is accomplished in a strikingly simple way: in a binary trie, the i highest complete levels are replaced by a single node of degree 2i; the compression is repeated in the subtries. This structure, the levelcompressed trie, inherits the good properties of binary tries with respect to neighbour and range searches, while the external path length is significantly decreased. It also has the advantage of being easy to implement. Our analysis shows that the expected depth of a stored element is \Theta (log \Lambda n) for uniformly distributed data.
Average Profile And Limiting Distribution For A Phrase Size In The LempelZiv Parsing Algorithm
 IEEE Trans. Information Theory
, 1994
"... Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice the following parameters are of interest: number of phrases, the size of a phrase, ..."
Abstract

Cited by 28 (15 self)
 Add to MetaCart
Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a randomly selected phrase, and the average number of phrases of a given size (the so called average profile of phrase sizes). These parameters can be efficiently analyzed through a digital search tree representation. For a memoryless source with unequal probabilities of symbols generation (the so called asymmetric Bernoulli model), we prove that the size of a typical phrase is asymptotically normally distributed with mean and the variance explicitly computed. In terms of digital search trees, we prove the normal limiting distribution of the typical depth (i.e., the length of a path from the root to a ...
Burst Tries: A Fast, Efficient Data Structure for String Keys
 ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or nearsorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
The path length of random skip lists
 Acta Informatica
, 1994
"... Abstract. The skip list is a recently introduced data structure that may be seen as an alternative to (digital) tries. In the present paper we analyze the path length of random skip lists asymptotically, i.e. we study the cumulated successful search costs. In particular we derive a precise asymptoti ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
Abstract. The skip list is a recently introduced data structure that may be seen as an alternative to (digital) tries. In the present paper we analyze the path length of random skip lists asymptotically, i.e. we study the cumulated successful search costs. In particular we derive a precise asymptotic result on the variance, being of order n 2 (which isincontrast to tries under the symmetric Bernoulli model, where it is only of order n). We also intend to present some sort of technical toolkit for the skilful manipulation and asymptotic evaluation of generating functions that appear in this context.
Generalized Digital Trees and their Differencedifferential Equations
, 1992
"... . Consider a tree partitioning process in which n elements are split into b at the root of a tree (b a design parameter), the rest going recursively into two subtrees with a binomial probability distribution. This extends some familiar tree data structures of computer science like the digital trie ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
. Consider a tree partitioning process in which n elements are split into b at the root of a tree (b a design parameter), the rest going recursively into two subtrees with a binomial probability distribution. This extends some familiar tree data structures of computer science like the digital trie and the digital search tree. The exponential generating function for the expected size of the tree satisfies a difference differential equation of order b, d b dz b f(z) = e z + 2e z=2 f( z 2 ): The solution involves going to ordinary (rather than exponential) generating functions, analyzing singularities by means of Mellin transforms and contour integration. The method is of some general interest since a large number of related problems on digital structures can be treated in this way via singularity analysis of ordinary generating functions. Work of this author was supported in part by the Basic Research Action of the E.C. under contract No. 3075 (Project ALCOM). y The resea...