Results 1  10
of
21
Probabilistic Counting Algorithms for Data Base Applications
, 1985
"... This paper introduces a class of probabilistic counting lgorithms with which one can estimate the number of distinct elements in a large collection of data (typically a large file stored on disk) in a single pass using only a small additional storage (typically less than a hundred binary words) a ..."
Abstract

Cited by 340 (6 self)
 Add to MetaCart
This paper introduces a class of probabilistic counting lgorithms with which one can estimate the number of distinct elements in a large collection of data (typically a large file stored on disk) in a single pass using only a small additional storage (typically less than a hundred binary words) and only a few operations per element scanned. The algorithms are based on statistical observations made on bits of hashed values of records. They are by con struction totally insensitive to the replicafive structure of elements in the file; they can be used in the context of distributed systems without any degradation of performances and prove especially useful in the context of data bases query optimisation. ; 1985 Academic Press, Inc
A Survey of Adaptive Sorting Algorithms
, 1992
"... Introduction and Survey; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems  Sorting and Searching; E.5 [Data]: Files  Sorting/searching; G.3 [Mathematics of Computing]: Probability and Statistics  Probabilistic algorithms; E.2 [Data Storage Represe ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
Introduction and Survey; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems  Sorting and Searching; E.5 [Data]: Files  Sorting/searching; G.3 [Mathematics of Computing]: Probability and Statistics  Probabilistic algorithms; E.2 [Data Storage Representation]: Composite structures, linked representations. General Terms: Algorithms, Theory. Additional Key Words and Phrases: Adaptive sorting algorithms, Comparison trees, Measures of disorder, Nearly sorted sequences, Randomized algorithms. A Survey of Adaptive Sorting Algorithms 2 CONTENTS INTRODUCTION I.1 Optimal adaptivity I.2 Measures of disorder I.3 Organization of the paper 1.WORSTCASE ADAPTIVE (INTERNAL) SORTING ALGORITHMS 1.1 Generic Sort 1.2 CookKim division 1.3 Partition Sort 1.4 Exponential Search 1.5 Adaptive Merging 2.EXPECTEDCASE ADAPTIV
Efficient ExternalMemory Data Structures and Applications
, 1996
"... In this thesis we study the Input/Output (I/O) complexity of largescale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/Oefficient algorithms for them. A general theme in our work is to design I/Oeffic ..."
Abstract

Cited by 38 (12 self)
 Add to MetaCart
In this thesis we study the Input/Output (I/O) complexity of largescale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/Oefficient algorithms for them. A general theme in our work is to design I/Oefficient algorithms through the design of I/Oefficient data structures. One of our philosophies is to try to isolate all the I/O specific parts of an algorithm in the data structures, that is, to try to design I/O algorithms from internal memory algorithms by exchanging the data structures used in internal memory with their external memory counterparts. The results in the thesis include a technique for transforming an internal memory tree data structure into an external data structure which can be used in a batched dynamic setting, that is, a setting where we for example do not require that the result of a search operation is returned immediately. Using this technique we develop batched dynamic external versions of the (onedimensional) rangetree and the segmenttree and we develop an external priority queue. Following our general philosophy we show how these structures can be used in standard internal memory sorting algorithms
A General Lower Bound on the I/OComplexity of Comparisonbased Algorithms
 In Proc. Workshop on Algorithms and Data Structures, LNCS 709
, 1993
"... We show a general relationship between the number of comparisons and the number of I/Ooperations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/Ooperations needed to solve a problem whenever a lower bound on the number of comparisons is known ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
We show a general relationship between the number of comparisons and the number of I/Ooperations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/Ooperations needed to solve a problem whenever a lower bound on the number of comparisons is known. We use the result to show lower bounds on the I/Ocomplexity on a number of problems where known techniques only give trivial bounds. Among these are the problems of removing duplicates from a multiset, a problem of great importance in e.g. relational database systems, and the problem of determining the mode  the most frequently occurring element  of a multiset. We develop algorithms for these problems in order to show that the lower bounds are tight.
The ins and outs of the probabilistic model checker MRMC
 in Proc. QEST’09
, 2009
"... The Markov Reward Model Checker (MRMC) is a software tool for verifying properties over probabilistic models. It supports PCTL and CSL model checking, and their reward extensions. Distinguishing features of MRMC are its support for computing time and rewardbounded reachability probabilities, (prop ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
The Markov Reward Model Checker (MRMC) is a software tool for verifying properties over probabilistic models. It supports PCTL and CSL model checking, and their reward extensions. Distinguishing features of MRMC are its support for computing time and rewardbounded reachability probabilities, (propertydriven) bisimulation minimization, and precise onthefly steadystate detection. Recent tool features include timebounded reachability analysis for uniform CTMDPs and CSL model checking by discreteevent simulation. This paper presents the tool’s current status and its implementation details. 1.
Grouping and Duplicate Elimination: Benefits of Early Aggregation
, 1997
"... Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation funct ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation functions. This paper describes six different algorithms for grouping and aggregation, shows how to incorporate early aggregation in each of them, and analyzes the resulting reduction in intermediate data. In addition to the grouping algorithm used, the reduction depends on several factors: the number of groups, the skew in group size distribution, the input size, and the amount of main memory available. All six algorithms considered benefit from early aggregation with grouping by hash partitioning producing the least amount of intermediate data. If the group size distribution is skewed, the overall reduction can be very significant, even with a modest amount of additional main memory.
Large alphabets and incompressibility
 Information Processing Letters
"... We briefly survey some concepts related to empirical entropy — normal numbers, de Bruijn sequences and Markov processes — and investigate how well it approximates Kolmogorov complexity. Our results suggest ℓthorder empirical entropy stops being a reasonable complexity metric for almost all strings ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
We briefly survey some concepts related to empirical entropy — normal numbers, de Bruijn sequences and Markov processes — and investigate how well it approximates Kolmogorov complexity. Our results suggest ℓthorder empirical entropy stops being a reasonable complexity metric for almost all strings of length m over alphabets of size n about when n ℓ surpasses m. Key words: Data compression 1
Dynamic Shannon Coding
, 2005
"... We present a new algorithm for dynamic prefixfree coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient lengthr ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
We present a new algorithm for dynamic prefixfree coding, based on Shannon coding. We give a simple analysis and prove a better upper bound on the length of the encoding produced than the corresponding bound for dynamic Huffman coding. We show how our algorithm can be modified for efficient lengthrestricted coding, alphabetic coding and coding with unequal letter costs.
Cacheoblivious algorithms and data structures
 IN SWAT
, 2004
"... Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as stand ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the idealcache model as a formal model of computation for developing algorithms in environments with multiple levels of caching, and coined the terminology of cacheoblivious algorithms. Cacheoblivious algorithms are described as standard RAM algorithms with only one memory level, i.e. without any knowledge about memory hierarchies, but are analyzed in the twolevel I/O model of Aggarwal and Vitter for an arbitrary memory and block size and an optimal offline cache replacement strategy. The result are algorithms that automatically apply to multilevel memory hierarchies. This paper gives an overview of the results achieved on cacheoblivious algorithms and data structures since the seminal paper by Frigo et al.
LinearSpace Data Structures for Range Mode Query in Arrays ∗
"... A mode of a multiset S is an element a ∈ S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1: n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
A mode of a multiset S is an element a ∈ S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1: n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query consists of an input pair of indices (i, j) for which a mode of A[i: j] must be returned. The best previous data structure with linear space, by Krizanc, Morin, and Smid (ISAAC 2003), requires O ( √ n log log n) query time. We improve their result and present an O(n)space data structure that supports range mode queries in O ( p n / log n) worstcase time. Furthermore, we present strong evidence that a query time significantly below √ n cannot be achieved by purely combinatorial techniques; we show that boolean matrix multiplication of two √ n × √ n matrices reduces to n range mode queries in an array of size O(n). Additionally, we give linearspace data structures for orthogonal range mode in higher dimensions (queries in near O(n 1−1/2d) time) and for halfspace range mode in higher dimensions (queries in O(n 1−1/d2) time).