Results 1  10
of
79
Efficient filtering of XML documents with XPath expressions
, 2002
"... cychan,pascal,minos,rastogi¡ We propose a novel index structure, termed XTrie, that supports the efficient filtering of XML documents based on XPath expressions. Our XTrie index structure offers several novel features that make it especially attractive for largescale publish/subscribe systems. First ..."
Abstract

Cited by 172 (12 self)
 Add to MetaCart
cychan,pascal,minos,rastogi¡ We propose a novel index structure, termed XTrie, that supports the efficient filtering of XML documents based on XPath expressions. Our XTrie index structure offers several novel features that make it especially attractive for largescale publish/subscribe systems. First, XTrie is designed to support effective filtering based on complex XPath expressions (as opposed to simple, singlepath specifications). Second, our XTrie structure and algorithms are designed to support both ordered and unordered matching of XML data. Third, by indexing on sequences of element names organized in a trie structure and using a sophisticated matching algorithm, XTrie is able to both reduce the number of unnecessary index probes as well as avoid redundant matchings, thereby providing extremely efficient filtering. Our experimental results over a wide range of XML document and XPath expression workloads demonstrate that our XTrie index structure outperforms earlier approaches by wide margins. 1.
Reducing the braking distance of an SQL query engine
 In Proc. of the 24th VLDB Conf
, 1998
"... In a recent paper, we proposed adding a STOP AFTER clause to SQL to permit the cardinality of a query result to be explicitly limited by query writers and query tools. We demonstrated the usefulness of having this clause, showed how to extend a traditional costbased query optimizer to accommodate i ..."
Abstract

Cited by 87 (6 self)
 Add to MetaCart
In a recent paper, we proposed adding a STOP AFTER clause to SQL to permit the cardinality of a query result to be explicitly limited by query writers and query tools. We demonstrated the usefulness of having this clause, showed how to extend a traditional costbased query optimizer to accommodate it, and demonstrated via DB2based simulations that large performance gains are possible when STOP AFTER queries are explicitly supported by the database engine. In this paper, we present several new strategies for efficiently processing STOP AFTER queries. These strategies, based largely on the use of range partitioning techniques, offer significant additional savings for handling STOP AFTER queries that yield sizeable result sets. We describe classes of queries where such savings would indeed arise and present experimental measurements that show the benefits and tradeoffs associated with the new processing strategies. 1
HAVAL  A OneWay Hashing Algorithm with Variable Length of Output
, 1993
"... A oneway hashing algorithm is a deterministic algorithm that compresses an arbitrary long message into a value of specified length. The output value represents the fingerprint or digest of the message. A cryptographically useful property of a oneway hashing algorithm is that it is infeasible to fi ..."
Abstract

Cited by 51 (17 self)
 Add to MetaCart
A oneway hashing algorithm is a deterministic algorithm that compresses an arbitrary long message into a value of specified length. The output value represents the fingerprint or digest of the message. A cryptographically useful property of a oneway hashing algorithm is that it is infeasible to find two distinct messages that have the same fingerprint. This paper proposes a oneway hashing algorithm called HAVAL. HAVAL compresses a message of arbitrary length into a fingerprint of 128, 160, 192, 224 or 256 bits. In addition, HAVAL has a parameter that controls the number of passes a message block (of 1024 bits) is processed. A message block can be processed in 3, 4 or 5 passes. By combining output length with pass, we can provide fifteen (15) choices for practical applications where different levels of security are required. The algorithm is very efficient and particularly suited for 32bit computers which predominate the current workstation market. Experiments show that HAVAL is 60%...
Web Prefetching Using Partial Match Prediction
, 1998
"... Web traffic is now one of the major components of Internet traffic. One of the main directions of research in this area is to reduce the time latencies users experience when navigating through Web sites. Caching is already being used in that direction, yet, the characteristics of the Web cause cachi ..."
Abstract

Cited by 50 (1 self)
 Add to MetaCart
Web traffic is now one of the major components of Internet traffic. One of the main directions of research in this area is to reduce the time latencies users experience when navigating through Web sites. Caching is already being used in that direction, yet, the characteristics of the Web cause caching in this medium to have poor performance. Therefore, prefetching is now being studied in the Web context. This study investigates the use of partial match prediction, a technique taken from the data compression literature, for prefetching in the Web. The main concern when employing prefetching is to predict as many future requests as possible, while limiting the false predictions to a minimum. The simulation results suggest that a high fraction of the predictions are accurate (e.g., predicts 18%23% of the requests with 90%80% accuracy), so that additional network traffic is kept low. Furthermore, the simulations show that prefetching can substantially increase cache hit rates. 1 Introduc...
Caching and Lemmaizing in Model Elimination Theorem Provers
, 1992
"... Theorem provers based on model elimination have exhibited extremely high inference rates but have lacked a redundancy control mechanism such as subsumption. In this paper we report on work done to modify a model elimination theorem prover using two techniques, caching and lemmaizing, that have reduc ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
Theorem provers based on model elimination have exhibited extremely high inference rates but have lacked a redundancy control mechanism such as subsumption. In this paper we report on work done to modify a model elimination theorem prover using two techniques, caching and lemmaizing, that have reduced by more than an order of magnitude the time required to find proofs of several problems and that have enabled the prover to prove theorems previously unobtainable by topdown model elimination theorem provers.
Fast Priority Queues for Cached Memory
 ACM Journal of Experimental Algorithmics
, 1999
"... This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on kway merging. It improves previous external memory algorithm ..."
Abstract

Cited by 46 (7 self)
 Add to MetaCart
This paper advocates the adaption of external memory algorithms to this purpose. This idea and the practical issues involved are exemplified by engineering a fast priority queue suited to external memory and cached memory that is based on kway merging. It improves previous external memory algorithms by constant factors crucial for transferring it to cached memory. Running in the cache hierarchy of a workstation the algorithm is at least two times faster than an optimized implementation of binary heaps and 4ary heaps for large inputs
On Fundamental Tradeoffs between Delay Bounds and Computational Complexity in Packet Scheduling Algorithms
 in Proceedings of ACM SIGCOMM ’02
, 2002
"... concerning the computational complexity for packet scheduling algorithms to achieve tight endtoend delay bounds. We rst focus on the dierence between the time a packet nishes service in a scheduling algorithm and its virtual nish time under a GPS (General Processor Sharing) scheduler, called GPS ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
concerning the computational complexity for packet scheduling algorithms to achieve tight endtoend delay bounds. We rst focus on the dierence between the time a packet nishes service in a scheduling algorithm and its virtual nish time under a GPS (General Processor Sharing) scheduler, called GPSrelative delay. We prove that, under a slightly restrictive but reasonable computational model, the lower bound computational complexity of any scheduling algorithm that guarantees O(1) GPSrelative delay bound is log2n) (widely believed as a \folklore theorem" but never proved). We also discover that, surprisingly, the complexity lower bound remains the same even if the delay bound is relaxed to O(n ) for 0 < a < 1. This implies that the delaycomplexity tradeo curve is \at" in the \interval" [O(1), O(n)). We later extend both complexity results (for O(1) ) delay) to a much stronger computational model. Finally, we show that the same complexity lower bounds are conditionally applicable to guaranteeing tight endtoend delay bounds. This is done by untangling the relationship between the GPSrelative delay bound and the endtoend delay bound.
Optimal Sampling Strategies in Quicksort and Quickselect
 PROC. OF THE 25TH INTERNATIONAL COLLOQUIUM (ICALP98), VOLUME 1443 OF LNCS
, 1998
"... It is well known that the performance of quicksort can be substantially improved by selecting the median of a sample of three elements as the pivot of each partitioning stage. This variant is easily generalized to samples of size s = 2k + 1. For large samples the partitions are better as the median ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
It is well known that the performance of quicksort can be substantially improved by selecting the median of a sample of three elements as the pivot of each partitioning stage. This variant is easily generalized to samples of size s = 2k + 1. For large samples the partitions are better as the median of the sample makes a more accurate estimate of the median of the array to be sorted, but the amount of additional comparisons and exchanges to find the median of the sample also increases. We show that the optimal sample size to minimize the average total cost of quicksort (which includes both comparisons and exchanges) is s = a \Delta p n + o( p n ). We also give a closed expression for the constant factor a, which depends on the medianfinding algorithm and the costs of elementary comparisons and exchanges. The result above holds in most situations, unless the cost of an exchange exceeds by far the cost of a comparison. In that particular case, it is better to select not the median of...
Randomized Binary Search Trees
 Journal of the ACM
, 1997
"... In this paper we present randomized algorithms over binary search trees such that: a) the insertion of a set of keys, in any fixed order, into an initially empty tree always produces a random binary search tree; b) the deletion of any key from a random binary search tree results in a random binary s ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
In this paper we present randomized algorithms over binary search trees such that: a) the insertion of a set of keys, in any fixed order, into an initially empty tree always produces a random binary search tree; b) the deletion of any key from a random binary search tree results in a random binary search tree; c) the random choices made by the algorithms are based upon the sizes of the subtrees of the tree; this implies that we can support accesses by rank without additional storage requirements or modification of the data structures; and d) the cost of any elementary operation, measured as the number of visited nodes, is the same as the expected cost of its standard deterministic counterpart; hence, all search and update operations have guaranteed expected cost O(log n), but now irrespective of any assumption on the input distribution. 1. Introduction Given a binary search tree (BST, for short), common operations are the search of an item given its key and the retrieval of the inform...
Indexing Compressed Text
 Proceedings of the 4th South American Workshop on String Processing
, 1997
"... We present a technique to build an index based on suffix arrays for compressed texts. We also propose a compression scheme for textual databases based on words that generates a compression code that preserves the lexicographical ordering of the text words. As a consequence it permits the sorting of ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
We present a technique to build an index based on suffix arrays for compressed texts. We also propose a compression scheme for textual databases based on words that generates a compression code that preserves the lexicographical ordering of the text words. As a consequence it permits the sorting of the compressed strings to generate the suffix array without decompressing. As the compressed text is under 30% of the size of the original text we are able to build the suffix array twice as fast on the compressed text. The compressed text plus index is 5560% of the size of the original text plus index and search times are reduced to approximately half the time. We also present analytical and experimental results for different variations of the wordoriented compression paradigm.