Results 1 - 10
of
27
Improved Histograms for Selectivity Estimation of Range Predicates
, 1996
"... Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspec ..."
Abstract
-
Cited by 211 (20 self)
- Add to MetaCart
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance. 1 Introduction...
Space-Efficient Online Computation of Quantile Summaries
- In SIGMOD
, 2001
"... An ε-approximate quantile summary of a sequence of N elements is a data structure that can answer quantile queries about the sequence to within a precision of εN . We present a new online... ..."
Abstract
-
Cited by 154 (1 self)
- Add to MetaCart
An ε-approximate quantile summary of a sequence of N elements is a data structure that can answer quantile queries about the sequence to within a precision of εN . We present a new online...
Computing on Data Streams
, 1998
"... In this paper we study the space requirement of algorithms that make only one (or a small number of) pass(es) over the input data. We study such algorithms under a model of data streams that we introduce here. We give a number of upper and lower bounds for problems stemming from queryprocessing, ..."
Abstract
-
Cited by 141 (3 self)
- Add to MetaCart
In this paper we study the space requirement of algorithms that make only one (or a small number of) pass(es) over the input data. We study such algorithms under a model of data streams that we introduce here. We give a number of upper and lower bounds for problems stemming from queryprocessing, invoking in the process tools from the area of communication complexity.
Approximate Medians and other Quantiles in One Pass and with Limited Memory
, 1998
"... We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply without regard to the value distribution or the arrival distributions of the dataset. The main memory requirements are smaller than those reported ea ..."
Abstract
-
Cited by 102 (1 self)
- Add to MetaCart
We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply without regard to the value distribution or the arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude. We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e., they apply with respect to a (user controlled) confidence parameter. We present the algorithms, their theoretical analysis and simulation results. 1 Introduction This article studies the problem of computing order statistics of large sequences of online or disk-resident data using as little main memory as possible. We focus on computing quantiles, which are elements at specific positions in the sorted order of the input. The OE-quantile, for OE 2 [0; ...
How to Summarize the Universe: Dynamic Maintenance of Quantiles
- In VLDB
, 2002
"... Order statistics, i.e., quantiles, are frequently used in databases both at the database server as well as the application level. For example, they are useful in selectivity estimation during query optimization, in partitioning large relations, in estimating query result sizes when building us ..."
Abstract
-
Cited by 89 (12 self)
- Add to MetaCart
Order statistics, i.e., quantiles, are frequently used in databases both at the database server as well as the application level. For example, they are useful in selectivity estimation during query optimization, in partitioning large relations, in estimating query result sizes when building user interfaces, and in characterizing the data distribution of evolving datasets in the process of data mining.
Using Lifetime Predictors to Improve Memory Allocation Performance
, 1993
"... Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are al ..."
Abstract
-
Cited by 71 (7 self)
- Add to MetaCart
Dynamic storage allocation is used heavily in many application areas including interpreters, simulators, optimizers, and translators. We describe research that can improve all aspects of the performance of dynamic storage allocation by predicting the lifetimes of short-lived objects when they are allocated. Using five significant, allocation-intensive C programs, we show that a great fraction of all bytes allocated are short-lived (? 90% in all cases). Furthermore, we describe an algorithm for lifetime prediction that accurately predicts the lifetimes of 42--99% of all objects allocated. We describe and simulate a storage allocator that takes advantage of lifetime prediction of short-lived objects and show that it can significantly improve a program's memory overhead and reference locality, and even, at times, improve CPU performance as well.
Self-Adaptive Logical Processes: the Probabilistic Distributed Simulation Protocol
- In Proc. of the 27 th Annual Simulation Symposium
, 1994
"... A probabilistic distributed discrete event simulation strategy is developed as a performance efficient compromise between the two classical approaches in parallel and distributed simulation, the conservative and the optimistic approach. It weakens the conservative "block until safe-to-process"-rule ..."
Abstract
-
Cited by 23 (12 self)
- Add to MetaCart
A probabilistic distributed discrete event simulation strategy is developed as a performance efficient compromise between the two classical approaches in parallel and distributed simulation, the conservative and the optimistic approach. It weakens the conservative "block until safe-to-process"-rule in a sense that if the time instant of the occurrence of an external event is in the time interval [s; t], it allows progressing simulation up until the forecasted next event instant b t(O), s b t(O) t, but further progression only with controlled probability. b t(O) is an estimate based on the arrival instant differences O = (ffi 1 ; ffi 2 ; : : : ffi n ) observed during a time window by some logical (simulation) process. Compared to the optimistic strategy it prevents from propagating incorrect computations too far ahead into the simulated future, and thus avoids unnecessary communication overhead by breaking rollback cascades as early as possible. The arrival patterns observed in O are ...
Evaluating Models of Memory Allocation
- ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION
, 1992
"... Because dynamic memory management is an important part of a large class of computer programs, high-performance algorithms for dynamic memory management have been, and will continue to be, of considerable interest. We evaluate and compare models of the memory allocation behavior in actual programs an ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
Because dynamic memory management is an important part of a large class of computer programs, high-performance algorithms for dynamic memory management have been, and will continue to be, of considerable interest. We evaluate and compare models of the memory allocation behavior in actual programs and investigate how these models can be used to explore the performance of memory management algorithms. These models, if accurate enough, provide an attractive alternative to algorithm evaluation based on trace-driven simulation using actual traces. We explore a range of models of increasing complexity including models that have been used by other researchers. Based on our analysis, we draw three important conclusions. First, a very simple model, which generates a uniform distribution around the mean of observed values, is often quite accurate. Second, two new models we propose show greater accuracy than those previously described in the literature. Finally, none of the models investigated ap...
Histogram-Based Solutions to Diverse Database Estimation Problems
- Data Engineering Bulletin
, 1995
"... Many current database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate some query result sizes and access plan costs. In this paper, we overview the line of research on histograms that we have followed a ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
Many current database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate some query result sizes and access plan costs. In this paper, we overview the line of research on histograms that we have followed at the Univ. of Wisconsin. Our goal has been to identify classes of histograms that combine three features in most realistic cases: (i) they produce estimates with small errors, (ii) they are innexpensive to construct, use, and maintain, and (iii) they can be used for many diverse estimation problems. Based on that goal, we present several results, which eventually point towards a class of histograms that are practical, close to optimal, and effective in estimating sizes of query results, value distributions in query results, and even costs of accesses using secondary indices.
Standardization of Event Traces Considered Harmful or Is an Implementation of Object-Independent Event Trace Monitoring and Analysis Systems Possible?
- Proc. CNRS-NSF Workshop on Environments and Tools For Parallel Scientific Computing, St. Hilaire du Touvet, France, Elsevier, Advances in Parallel Computing
, 1993
"... Programming non-sequential computer systems is hard! Many tools and environments have been designed and implemented to ease the use and programming of such systems. The majority of the analysis tools is event-based and uses event traces for representing the dynamic behavior of the system under inves ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
Programming non-sequential computer systems is hard! Many tools and environments have been designed and implemented to ease the use and programming of such systems. The majority of the analysis tools is event-based and uses event traces for representing the dynamic behavior of the system under investigation, the object system. Most tools can only be used for one special object system, or a specific class of systems such as distributed shared memory machines. This limitation is not obvious because all tools provide the same basic functionality. This article discusses approaches to implementing object-independent event trace monitoring and analysis systems. The term object-independent means that the system can be used for the analysis of arbitrary (non-sequential) computer systems, operating systems, programming languages and applications. Three main topics are addressed: objectindependent monitoring, standardization of event trace formats and access interfaces and the application-indepe...

