Results 1  10
of
22
Probabilistic Counting Algorithms for Data Base Applications
, 1985
"... This paper introduces a class of probabilistic counting lgorithms with which one can estimate the number of distinct elements in a large collection of data (typically a large file stored on disk) in a single pass using only a small additional storage (typically less than a hundred binary words) a ..."
Abstract

Cited by 340 (6 self)
 Add to MetaCart
This paper introduces a class of probabilistic counting lgorithms with which one can estimate the number of distinct elements in a large collection of data (typically a large file stored on disk) in a single pass using only a small additional storage (typically less than a hundred binary words) and only a few operations per element scanned. The algorithms are based on statistical observations made on bits of hashed values of records. They are by con struction totally insensitive to the replicafive structure of elements in the file; they can be used in the context of distributed systems without any degradation of performances and prove especially useful in the context of data bases query optimisation. ; 1985 Academic Press, Inc
A Survey of Adaptive Sorting Algorithms
, 1992
"... Introduction and Survey; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems  Sorting and Searching; E.5 [Data]: Files  Sorting/searching; G.3 [Mathematics of Computing]: Probability and Statistics  Probabilistic algorithms; E.2 [Data Storage Represe ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
Introduction and Survey; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems  Sorting and Searching; E.5 [Data]: Files  Sorting/searching; G.3 [Mathematics of Computing]: Probability and Statistics  Probabilistic algorithms; E.2 [Data Storage Representation]: Composite structures, linked representations. General Terms: Algorithms, Theory. Additional Key Words and Phrases: Adaptive sorting algorithms, Comparison trees, Measures of disorder, Nearly sorted sequences, Randomized algorithms. A Survey of Adaptive Sorting Algorithms 2 CONTENTS INTRODUCTION I.1 Optimal adaptivity I.2 Measures of disorder I.3 Organization of the paper 1.WORSTCASE ADAPTIVE (INTERNAL) SORTING ALGORITHMS 1.1 Generic Sort 1.2 CookKim division 1.3 Partition Sort 1.4 Exponential Search 1.5 Adaptive Merging 2.EXPECTEDCASE ADAPTIV
Efficient ExternalMemory Data Structures and Applications
, 1996
"... In this thesis we study the Input/Output (I/O) complexity of largescale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/Oefficient algorithms for them. A general theme in our work is to design I/Oeffic ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
In this thesis we study the Input/Output (I/O) complexity of largescale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/Oefficient algorithms for them. A general theme in our work is to design I/Oefficient algorithms through the design of I/Oefficient data structures. One of our philosophies is to try to isolate all the I/O specific parts of an algorithm in the data structures, that is, to try to design I/O algorithms from internal memory algorithms by exchanging the data structures used in internal memory with their external memory counterparts. The results in the thesis include a technique for transforming an internal memory tree data structure into an external data structure which can be used in a batched dynamic setting, that is, a setting where we for example do not require that the result of a search operation is returned immediately. Using this technique we develop batched dynamic external versions of the (onedimensional) rangetree and the segmenttree and we develop an external priority queue. Following our general philosophy we show how these structures can be used in standard internal memory sorting algorithms
A General Lower Bound on the I/OComplexity of Comparisonbased Algorithms
 In Proc. Workshop on Algorithms and Data Structures, LNCS 709
, 1993
"... We show a general relationship between the number of comparisons and the number of I/Ooperations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/Ooperations needed to solve a problem whenever a lower bound on the number of comparisons is known ..."
Abstract

Cited by 32 (11 self)
 Add to MetaCart
We show a general relationship between the number of comparisons and the number of I/Ooperations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/Ooperations needed to solve a problem whenever a lower bound on the number of comparisons is known. We use the result to show lower bounds on the I/Ocomplexity on a number of problems where known techniques only give trivial bounds. Among these are the problems of removing duplicates from a multiset, a problem of great importance in e.g. relational database systems, and the problem of determining the mode  the most frequently occurring element  of a multiset. We develop algorithms for these problems in order to show that the lower bounds are tight.
The ins and outs of the probabilistic model checker MRMC
 in Proc. QEST’09
, 2009
"... The Markov Reward Model Checker (MRMC) is a software tool for verifying properties over probabilistic models. It supports PCTL and CSL model checking, and their reward extensions. Distinguishing features of MRMC are its support for computing time and rewardbounded reachability probabilities, (prop ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
The Markov Reward Model Checker (MRMC) is a software tool for verifying properties over probabilistic models. It supports PCTL and CSL model checking, and their reward extensions. Distinguishing features of MRMC are its support for computing time and rewardbounded reachability probabilities, (propertydriven) bisimulation minimization, and precise onthefly steadystate detection. Recent tool features include timebounded reachability analysis for uniform CTMDPs and CSL model checking by discreteevent simulation. This paper presents the tool’s current status and its implementation details. 1.
Grouping and Duplicate Elimination: Benefits of Early Aggregation
, 1997
"... Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation funct ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation functions. This paper describes six different algorithms for grouping and aggregation, shows how to incorporate early aggregation in each of them, and analyzes the resulting reduction in intermediate data. In addition to the grouping algorithm used, the reduction depends on several factors: the number of groups, the skew in group size distribution, the input size, and the amount of main memory available. All six algorithms considered benefit from early aggregation with grouping by hash partitioning producing the least amount of intermediate data. If the group size distribution is skewed, the overall reduction can be very significant, even with a modest amount of additional main memory.
Large alphabets and incompressibility
 Information Processing Letters
"... We briefly survey some concepts related to empirical entropy — normal numbers, de Bruijn sequences and Markov processes — and investigate how well it approximates Kolmogorov complexity. Our results suggest ℓthorder empirical entropy stops being a reasonable complexity metric for almost all strings ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
We briefly survey some concepts related to empirical entropy — normal numbers, de Bruijn sequences and Markov processes — and investigate how well it approximates Kolmogorov complexity. Our results suggest ℓthorder empirical entropy stops being a reasonable complexity metric for almost all strings of length m over alphabets of size n about when n ℓ surpasses m. Key words: Data compression 1
Cacheoblivious algorithms and data structures
 IN SWAT
, 2004
"... Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as stand ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the twolevel I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal offline cache replacement strategy. The result are algorithms that automatically apply to multilevel memory hierarchies. This paper gives an overview of the results achieved on cacheoblivious algorithms and data structures since the seminal paper by Frigo et al.
Dynamic Shannon Coding
, 2005
"... We present a new algorithm for dynamic prefixfree coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient lengthr ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
We present a new algorithm for dynamic prefixfree coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient lengthrestricted coding, alphabetic coding and coding with unequal letter costs.
DistributionSensitive Algorithms
 NORDIC J. COMPUT
, 1998
"... We investigate a new paradigm of algorithm design for geometric problems that can be termed distributionsensitive. Our notion of distribution is more combinatorial in nature than spatial. We illustrate this on problems like planarhulls and 2Dmaxima where some of the previously known outputsensit ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We investigate a new paradigm of algorithm design for geometric problems that can be termed distributionsensitive. Our notion of distribution is more combinatorial in nature than spatial. We illustrate this on problems like planarhulls and 2Dmaxima where some of the previously known outputsensitive algorithms are recast in this setting. In a number of cases, the distributionsensitive analysis yields superior results for the above problems. Moreover these bounds are shown to be tight in the linear decision tree model. Our approach owes its spirit to the results known for sorting multisets and we exploit this relationship further to derive fast and efficient parallel algorithms for sorting multisets along with the geometric problems.