Results 1  10
of
11
A New Efficient Radix Sort
, 1994
"... We present new improved algorithms for the sorting problem. The algorithms are not only efficient but also clear and simple. First, we introduce Forward Radix Sort which combines the advantages of traditional lefttoright and righttoleft radix sort in a simple manner. We argue that this algorithm ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
We present new improved algorithms for the sorting problem. The algorithms are not only efficient but also clear and simple. First, we introduce Forward Radix Sort which combines the advantages of traditional lefttoright and righttoleft radix sort in a simple manner. We argue that this algorithm will work very well in practice. Adding a preprocessing step, we obtain an algorithm with attractive theoretical properties. For example, n binary strings can be sorted in \Theta i n log i B n log n + 2 jj time, where B is the minimum number of bits that have to be inspected to distinguish the strings. This is an improvement over the previously best known result by Paige and Tarjan. The complexity may also be expressed in terms of H, the entropy of the input: n strings from a stationary ergodic process can be sorted in \Theta \Gamma n log \Gamma 1 H + 1 \Delta\Delta time, an improvement over the result recently presented by Chen and Reif.
Using Difficulty of Prediction to Decrease Computation: Fast Sort, Priority Queue and Convex Hull on Entropy Bounded Inputs
"... There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently (e.g. see [Vitter,KrishnanSl], [Karlin,Philips,Raghavan92], [Raghavan9 for use of Markov models for online algorithms, e.g., cashi ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently (e.g. see [Vitter,KrishnanSl], [Karlin,Philips,Raghavan92], [Raghavan9 for use of Markov models for online algorithms, e.g., cashing and prefetching). Their results used the fact that compressible sources are predictable (and vise versa), and showed that online algorithms can improve their performance by prediction. Actual page access sequences are in fact somewhat compressible, so their predictive methods can be of benefit. This paper investigates the interesting idea of decreasing computation by using learning in the opposite way, namely to determine the difficulty of prediction. That is, we will ap proximately learn the input distribution, and then improve the performance of the computation when the input is not too predictable, rather than the reverse. To our knowledge,
Fast updating of wellbalanced trees
 In SWAT 90, 2nd Scandinavian Workshop on Algorithm Theory
, 1990
"... Trees of optimal and nearoptimal height may be represented as a pointerfree structure in an array of size O(n). In this way we obtain an array implementation of a dictionary with O(log n) search cost and O(log2 n) update cost, allowing interpolation search to improve the expected search time. 1 In ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Trees of optimal and nearoptimal height may be represented as a pointerfree structure in an array of size O(n). In this way we obtain an array implementation of a dictionary with O(log n) search cost and O(log2 n) update cost, allowing interpolation search to improve the expected search time. 1 Introduction The binary search tree is a fundamental and well studied data structure, commonly used in computer applications to implement the abstract data type dictionary. In a comparisonbased model of computation, the lower bound on the three basic operations insert, delete and search is dlog(n + 1)e comparisons per operation. This bound may be achieved by storing the set in a binary search tree of optimal height. Definition 1 A binary tree has optimal height if and only if the height of the tree is dlog(n + 1)e. A special case of a tree of optimal height is an optimally balanced tree, as defined below. Definition 2 A binary tree is optimally balanced if and only if the difference in length between the longest and shortest paths is at most one.
Improved Bounds for Finger Search on a RAM
 In Algorithms – ESA 2003, LNCS Vol. 2832 (Springer 2003
, 2003
"... We present a new finger search tree with O(1) worstcase update time and O(log log d) expected search time with high probability in the Random Access Machine (RAM) model of computation for a large class of input distributions. The parameter d represents the number of elements (distance) between ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
We present a new finger search tree with O(1) worstcase update time and O(log log d) expected search time with high probability in the Random Access Machine (RAM) model of computation for a large class of input distributions. The parameter d represents the number of elements (distance) between the search element and an element pointed to by a finger, in a finger search tree that stores n elements. For the need of the analysis we model the updates by a "balls and bins" combinatorial game that is interesting in its own right as it involves insertions and deletions of balls according to an unknown distribution.
Interpolation Search for NonIndependent Data
 In Proceedings of the 15th Annual ACMSIAM Symposium on Discrete Algorithms (SODA
, 2004
"... We define a deterministic metric of “wellbehaved data” that enables searching along the lines of interpolation search. Specifically, define ∆ to be the ratio of distances between the farthest and nearest pair of adjacent elements. We develop a data structure that stores a dynamic set of n integers ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We define a deterministic metric of “wellbehaved data” that enables searching along the lines of interpolation search. Specifically, define ∆ to be the ratio of distances between the farthest and nearest pair of adjacent elements. We develop a data structure that stores a dynamic set of n integers subject to insertions, deletions, and predecessor/successor queries in O(lg ∆) time per operation. This result generalizes interpolation search and interpolation search trees smoothly to nonrandom (in particular, nonindependent) input data. In this sense, we capture the amount of “pseudorandomness” required for effective interpolation search. 1
WWW 2007 / Track: Data Mining Session: Similarity Search ABSTRACT Detecting NearDuplicates for Web Crawling
"... Nearduplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a nea ..."
Abstract
 Add to MetaCart
Nearduplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a nearduplicate of a previously crawled web page or not. In the course of developing a nearduplicate detection system for a multibillion page repository, we make two research contributions. First, we demonstrate that Charikar’s fingerprinting technique is appropriate for this goal. Second, we present an algorithmic technique for identifying existing fbit fingerprints that differ from a given fingerprint in at most k bitpositions, for small k. Our technique is useful for both online queries (single fingerprints) and batch queries (multiple fingerprints). Experimental evaluation over real data confirms the practicality of our design.
AND
"... Abstract. A new data structure called interpolation search tree (1ST) is presented which supports interpolation search and insertions and deletions. Amortized insertion and deletion cost is O(log n). The expected search time in a random file is O(log log n). This is not only true for the uniform dis ..."
Abstract
 Add to MetaCart
Abstract. A new data structure called interpolation search tree (1ST) is presented which supports interpolation search and insertions and deletions. Amortized insertion and deletion cost is O(log n). The expected search time in a random file is O(log log n). This is not only true for the uniform distribution but for a wide class of probability distributions. Categories and Subject Descriptors: E. 1 [Data Structures]: trees; F.2 [Analysis of Algorithms and
Abstract Interpolation Search for NonIndependent Data
"... We define a deterministic metric of “wellbehaved data” that enables searching along the lines of interpolation search. Specifically, define ∆ to be the ratio of distances between the farthest and nearest pair of adjacent elements. We develop a data structure that stores a dynamic set of n integers ..."
Abstract
 Add to MetaCart
We define a deterministic metric of “wellbehaved data” that enables searching along the lines of interpolation search. Specifically, define ∆ to be the ratio of distances between the farthest and nearest pair of adjacent elements. We develop a data structure that stores a dynamic set of n integers subject to insertions, deletions, and predecessor/successor queries in O(lg ∆) time per operation. This result generalizes interpolation search and interpolation search trees smoothly to nonrandom (in particular, nonindependent) input data. In this sense, we capture the amount of “pseudorandomness” required for effective interpolation search. 1
Success Rate of Interpolation in Subsegment Prediction
, 1993
"... this paper we consider a different application of searching, in which interpolation can be used very efficiently. Interpolation is used not to predict the exact position at which a key is located but rather to predict the subsegment in which it lies ..."
Abstract
 Add to MetaCart
this paper we consider a different application of searching, in which interpolation can be used very efficiently. Interpolation is used not to predict the exact position at which a key is located but rather to predict the subsegment in which it lies
Dynamic Interpolation Search Revisited ⋆
"... Abstract. A new dynamic Interpolation Search (IS) data structure is presented that achieves O(log log n) search time with high probability on unknown continuous or even discrete input distributions with measurable probability of key collisions, including power law and Binomial distributions. No such ..."
Abstract
 Add to MetaCart
Abstract. A new dynamic Interpolation Search (IS) data structure is presented that achieves O(log log n) search time with high probability on unknown continuous or even discrete input distributions with measurable probability of key collisions, including power law and Binomial distributions. No such previous result holds for IS when the probability of key collisions is measurable. Moreover, our data structure exhibits O(1) expected search time with high probability for a wide class of input distributions that contains all those for which o(log log n) expected search time was previously known. 1