Results 1 - 10
of
16
Probabilistic Counting Algorithms for Data Base Applications
, 1985
"... This paper introduces a class of probabilistic counting lgorithms with which one can estimate the number of distinct elements in a large collection of data (typically a large file stored on disk) in a single pass using only a small additional storage (typically less than a hundred binary words) a ..."
Abstract
-
Cited by 284 (6 self)
- Add to MetaCart
This paper introduces a class of probabilistic counting lgorithms with which one can estimate the number of distinct elements in a large collection of data (typically a large file stored on disk) in a single pass using only a small additional storage (typically less than a hundred binary words) and only a few operations per element scanned. The algorithms are based on statistical observations made on bits of hashed values of records. They are by con- struction totally insensitive to the replicafive structure of elements in the file; they can be used in the context of distributed systems without any degradation of performances and prove especially useful in the context of data bases query optimisation. ; 1985 Academic Press, Inc
A Survey of Adaptive Sorting Algorithms
, 1992
"... Introduction and Survey; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems --- Sorting and Searching; E.5 [Data]: Files --- Sorting/searching; G.3 [Mathematics of Computing]: Probability and Statistics --- Probabilistic algorithms; E.2 [Data Storage Represe ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
Introduction and Survey; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems --- Sorting and Searching; E.5 [Data]: Files --- Sorting/searching; G.3 [Mathematics of Computing]: Probability and Statistics --- Probabilistic algorithms; E.2 [Data Storage Representation]: Composite structures, linked representations. General Terms: Algorithms, Theory. Additional Key Words and Phrases: Adaptive sorting algorithms, Comparison trees, Measures of disorder, Nearly sorted sequences, Randomized algorithms. A Survey of Adaptive Sorting Algorithms 2 CONTENTS INTRODUCTION I.1 Optimal adaptivity I.2 Measures of disorder I.3 Organization of the paper 1.WORST-CASE ADAPTIVE (INTERNAL) SORTING ALGORITHMS 1.1 Generic Sort 1.2 Cook--Kim division 1.3 Partition Sort 1.4 Exponential Search 1.5 Adaptive Merging 2.EXPECTED-CASE ADAPTIV
Efficient External-Memory Data Structures and Applications
, 1996
"... In this thesis we study the Input/Output (I/O) complexity of large-scale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/O-efficient algorithms for them. A general theme in our work is to design I/O-effic ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
In this thesis we study the Input/Output (I/O) complexity of large-scale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/O-efficient algorithms for them. A general theme in our work is to design I/O-efficient algorithms through the design of I/O-efficient data structures. One of our philosophies is to try to isolate all the I/O specific parts of an algorithm in the data structures, that is, to try to design I/O algorithms from internal memory algorithms by exchanging the data structures used in internal memory with their external memory counterparts. The results in the thesis include a technique for transforming an internal memory tree data structure into an external data structure which can be used in a batched dynamic setting, that is, a setting where we for example do not require that the result of a search operation is returned immediately. Using this technique we develop batched dynamic external versions of the (one-dimensional) range-tree and the segment-tree and we develop an external priority queue. Following our general philosophy we show how these structures can be used in standard internal memory sorting algorithms
A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms
- In Proc. Workshop on Algorithms and Data Structures, LNCS 709
, 1993
"... We show a general relationship between the number of comparisons and the number of I/O-operations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/O-operations needed to solve a problem whenever a lower bound on the number of comparisons is known ..."
Abstract
-
Cited by 31 (11 self)
- Add to MetaCart
We show a general relationship between the number of comparisons and the number of I/O-operations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/O-operations needed to solve a problem whenever a lower bound on the number of comparisons is known. We use the result to show lower bounds on the I/O-complexity on a number of problems where known techniques only give trivial bounds. Among these are the problems of removing duplicates from a multiset, a problem of great importance in e.g. relational data-base systems, and the problem of determining the mode - the most frequently occurring element - of a multiset. We develop algorithms for these problems in order to show that the lower bounds are tight.
Grouping and Duplicate Elimination: Benefits of Early Aggregation
, 1997
"... Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation funct ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation functions. This paper describes six different algorithms for grouping and aggregation, shows how to incorporate early aggregation in each of them, and analyzes the resulting reduction in intermediate data. In addition to the grouping algorithm used, the reduction depends on several factors: the number of groups, the skew in group size distribution, the input size, and the amount of main memory available. All six algorithms considered benefit from early aggregation with grouping by hash partitioning producing the least amount of intermediate data. If the group size distribution is skewed, the overall reduction can be very significant, even with a modest amount of additional main memory. 1 Introduc...
Large Alphabets and Incompressibility
- Information Processing Letters
, 2006
"... We briefly survey some concepts related to empirical entropy --- normal numbers, de Bruijn sequences and Markov processes --- and investigate how well it approximates Kolmogorov complexity. Our results suggest #th-order empirical entropy stops being a reasonable complexity metric for almost all st ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We briefly survey some concepts related to empirical entropy --- normal numbers, de Bruijn sequences and Markov processes --- and investigate how well it approximates Kolmogorov complexity. Our results suggest #th-order empirical entropy stops being a reasonable complexity metric for almost all strings of length m over alphabets of size about when n # surpasses m.
Cache-oblivious algorithms and data structures
- In SWAT
, 2004
"... Abstract. Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cache-oblivious algorithms. Cache-oblivious algorithms are described ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cache-oblivious algorithms. Cache-oblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the two-level I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal off-line cache replacement strategy. The result are algorithms that automatically apply to multi-level memory hierarchies. This paper gives an overview of the results achieved on cache-oblivious algorithms and data structures since the seminal paper by Frigo et al. 1
Dynamic Shannon Coding
, 2005
"... We present a new algorithm for dynamic prefixfree coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient length-r ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
We present a new algorithm for dynamic prefixfree coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient length-restricted coding, alphabetic coding and coding with unequal letter costs.
Distribution-Sensitive Algorithms
- NORDIC J. COMPUT
, 1998
"... We investigate a new paradigm of algorithm design for geometric problems that can be termed distribution-sensitive. Our notion of distribution is more combinatorial in nature than spatial. We illustrate this on problems like planar-hulls and 2D-maxima where some of the previously known output-sensit ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We investigate a new paradigm of algorithm design for geometric problems that can be termed distribution-sensitive. Our notion of distribution is more combinatorial in nature than spatial. We illustrate this on problems like planar-hulls and 2D-maxima where some of the previously known output-sensitive algorithms are recast in this setting. In a number of cases, the distribution-sensitive analysis yields superior results for the above problems. Moreover these bounds are shown to be tight in the linear decision tree model. Our approach owes its spirit to the results known for sorting multisets and we exploit this relationship further to derive fast and efficient parallel algorithms for sorting multisets along with the geometric problems.
Distributed Computation of the Mode
- PODC'08
, 2008
"... This paper studies the problem of computing the most frequent element (the mode) by means of a distributed algorithm where the elements are located at the nodes of a network. Let k denote the number of distinct elements and further let mi be the number of occurrences of the element ei in the ordered ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper studies the problem of computing the most frequent element (the mode) by means of a distributed algorithm where the elements are located at the nodes of a network. Let k denote the number of distinct elements and further let mi be the number of occurrences of the element ei in the ordered list of occurrences m1> m2> =...> = mk. We give a deterministic distributed algorithm with time complexity O(D+k) where D denotes the diameter of the graph, which is essentially tight. As our main contribution, a Monte Carlo algorithm is presented which computes the mode in O(D + F2/m21 * log k) time with high probability, where the frequency moment F ` is deo/ned as F ` = Pki=1 m`i. This algorithm is substantially faster than the deterministic algorithm for various relevant frequency distributions. Moreover, we provide a lower bound of \Omega (D + F5/(m51B)), where B is the maximum message size, that captures the effect of the frequency distribution on the time complexity to compute the mode.

