Results 1 - 10
of
17
Query optimization in database systems
- ACM Computing Surveys
, 1984
"... Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast imple ..."
Abstract
-
Cited by 194 (0 self)
- Add to MetaCart
Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast implementations of basic operations, and combinatorial or heuristic algorithms for generating alternative access plans and choosing among them. These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries. In addition, nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed. The focus, however, is on query optimization in centralized database systems.
Practical Selectivity Estimation through Adaptive Sampling
, 1992
"... Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analys ..."
Abstract
-
Cited by 146 (6 self)
- Add to MetaCart
Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide "sanity bounds" to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.
Balancing Histogram Optimality and Practicality for Query Result Size Estimation
, 1995
"... Many current database systems use histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. In choosing among the various histograms, one has to balance between two conflicting goals: optimality, ..."
Abstract
-
Cited by 125 (14 self)
- Add to MetaCart
Many current database systems use histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. In choosing among the various histograms, one has to balance between two conflicting goals: optimality, so that generated estimates have the least error, and practicality, so that histograms can be constructed and maintained efficiently. In this paper, we present both theoretical and experimental results on several issues related to this trade-off. Our overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query. 1 Introduction Query optimizers of relational database systems decide on the most efficient access plan for a given query based on a variety...
Adaptive Selectivity Estimation Using Query Feedback
, 1993
"... In this paper, we propose a novel approach for estimating the record selectivities of database queries. The real attribute value distribution is adaptively approximated by a curve-fitting function using a query feedback mechanism. This approach has the advantages of requiring no extra database acces ..."
Abstract
-
Cited by 97 (6 self)
- Add to MetaCart
In this paper, we propose a novel approach for estimating the record selectivities of database queries. The real attribute value distribution is adaptively approximated by a curve-fitting function using a query feedback mechanism. This approach has the advantages of requiring no extra database access overhead for gathering statistics and of being able to continuously adapt the value distribution through queries and updates. Experimental results show that the estimation accuracy of this approach is comparable to traditional methods based on statistics gathering. 1 Introduction In most database systems, the task of query optimization is to choose an efficient execution plan. Best plan selection requires accurate estimates of the costs of alternative plans. One of the most important factors that affects plan cost is selectivity, which is the number of tuples satisfying a given predicate. Therefore, in most cases, the accuracy of selectivity estimates directly affects the choice of best p...
On the Estimation of Join Result Sizes
, 2006
"... Good estimates of join result sizes are critical for query op-timization in relational database management systems. We address the problem of incrementally obtaining accurate and consistent estimates of join result sizes. We have invented a new rule for choosing join selectiv-ities for estimating j ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Good estimates of join result sizes are critical for query op-timization in relational database management systems. We address the problem of incrementally obtaining accurate and consistent estimates of join result sizes. We have invented a new rule for choosing join selectiv-ities for estimating join result sizes. The rule is part of a new unified algorithm called Algorithm ELS (Equivalence and Largest Selectivity). Prior to computing any result sizes, equivalence classes are determined for the join columns. The algorithm also takes into account the effect of local predicates on table and column cardinalities. These computa-tions allow the correct selectivity values for each eligible join predicate to be computed. We show that the algorithm is correct and gives better estimates than current estimation algorithms.
Query Size Estimation using Machine Learning
- In Database Systems for Advanced Applications
, 1996
"... In a previous paper [6], we introduced the notion of using machine learning techniques to solve the problem of query size estimation in database query optimisation. In this paper, we build on this work by describing a new generic algorithm to correct the training set of queries for our machine learn ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In a previous paper [6], we introduced the notion of using machine learning techniques to solve the problem of query size estimation in database query optimisation. In this paper, we build on this work by describing a new generic algorithm to correct the training set of queries for our machine learning method in response to updates. The training set correction algorithm is not only useful in the context of our machine learning approach, but is also useful for improving existing query size estimation methods whose performance deteriorates in the presence of high update loads. A by-product of our correction algorithm is that training sets can be fixed-size, allowing the error-level to be set in advance. Experimental results show that our machine learning technique performs well (and better than alternative methods) after the correction algorithm is applied. Keywords Query Size Estimation, Query Optimisation, Machine Learning 1 Introduction A query optimiser for a database system aims t...
Bucket Skip Merge Join: A Scalable Algorithm for Join Processing in Very Large Databases using Indexes
, 1996
"... Join processing algorithms play a critical role in efficient query processing. Popular join processing algorithms like merge join and hash join typically access all the data items in the datasets to be joined. In this paper we present a new join algorithm called bucket skip merge join which does not ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Join processing algorithms play a critical role in efficient query processing. Popular join processing algorithms like merge join and hash join typically access all the data items in the datasets to be joined. In this paper we present a new join algorithm called bucket skip merge join which does not always access all the data items of the join sets. The basic idea is to divide the datasets into buckets that contain monotonically increasing values and maintain high and low values for each bucket. During join processing, the algorithm uses these values to skip a whole bucket or parts of a bucket when it can be determined that a match will not be found for the rest of the data items in the bucket. This considerably reduces the number of memory accesses, disk accesses and CPU time. Results of the performance tests on a prototype system indicates that our algorithm outperforms other popular join algorithms. Specifically, they scale better with the size of the database and the degree of join...
Rectangular Attribute Cardinality Map: A New Histogram-like Technique for Query Optimization
- in International Database Engineering and Applications Symposium, IDEAS'99
, 1999
"... Current database systems utilize histograms to approximate frequency distributions of attribute values of relations. These are used to efficiently estimate query result sizes and access plan costs. Even though they have been in use for nearly two decades, there has been no significant mathematical t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Current database systems utilize histograms to approximate frequency distributions of attribute values of relations. These are used to efficiently estimate query result sizes and access plan costs. Even though they have been in use for nearly two decades, there has been no significant mathematical techniques (other than those used in statistics for traditional histogram approximations) to study them. In this paper, we introduce a new histogram-like approximation strategy, called the Rectangular Attribute Cardinality Map (R-ACM), that aims to approximate the density of the underlying attribute values using the philosophies of numerical integration. In this new histogram-like approximation method, the density function within a given sector is approximated by a rectangular cell, where the height of the cell is obtained so as to guarantee that the actual probability density differs from the approximated one by a maximum of a userspecified tolerance, . Furthermore, unlike the two tradit...
Self-Join Size Estimation in Large-scale Distributed Data Systems
"... Abstract — In this work we tackle the open problem of self-join size (SJS) estimation in a large-scale Distributed Data System, where tuples of a relation are distributed over data nodes which comprise an overlay network. Our contributions include adaptations of five well-known SJS estimation centra ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract — In this work we tackle the open problem of self-join size (SJS) estimation in a large-scale Distributed Data System, where tuples of a relation are distributed over data nodes which comprise an overlay network. Our contributions include adaptations of five well-known SJS estimation centralized techniques (coined sequential, cross-sampling, adaptive, bifocal, and sample-count) to the network environment and a novel technique which is based on the use of the Gini coefficient. We develop analyses showing how Gini estimations can lead to estimations of the underlying Zipfian or power-law value distributions. We further contribute distributed sampling algorithms that can estimate accurately and efficiently the Gini coefficient. Finally, we provide detailed experimental evidence testifying for the claimed increased accuracy, precision, and efficiency of the proposed SJS estimation method, compared to the other methods. The proposed approach is the only one to ensure high efficiency, precision, and accuracy regardless of the skew of the underlying data. I.
Performance of a two headed disk system when serving database queries under the SCAN policy
- ACM Trans. Database Systems
, 1989
"... Disk drives with movable two-headed arms are now commercially available. The two heads are separated by a fixed number of cylinders. A major problem for optimizing disk head movement, when answering database requests, is the specification of the optimum number of cylinders separating the two heads. ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Disk drives with movable two-headed arms are now commercially available. The two heads are separated by a fixed number of cylinders. A major problem for optimizing disk head movement, when answering database requests, is the specification of the optimum number of cylinders separating the two heads. An earlier analytical study assumed a FCFS model and concluded that the optimum separation distance should be equal to 0.44657 of the number of cylinders N of the disk. This paper considers that the SCAN scheduling policy is used in file access, and it applies combinatorial analysis to derive exact formulas for the expected head movement. Furthermore, it is proven that the optimum separation distance is N/2- 1 (TN/2- 11 and LN/2- 1 J) if N is even (odd). In addition, a comparison with a single-headed disk system operating under the same scheduling policy shows that if the two heads are optimally spaced, then the mean seek distance is less than one-half of the value obtained with one head. In fact that the SCAN policy is used for many database applications (for example, batching and secondary key retrieval) demonstrates the potential of two-headed disk systems for improving the performance of database systems.

