Results 1 -
7 of
7
Query optimization in database systems
- ACM Computing Surveys
, 1984
"... Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast imple ..."
Abstract
-
Cited by 194 (0 self)
- Add to MetaCart
Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast implementations of basic operations, and combinatorial or heuristic algorithms for generating alternative access plans and choosing among them. These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries. In addition, nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed. The focus, however, is on query optimization in centralized database systems.
Practical Selectivity Estimation through Adaptive Sampling
, 1992
"... Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analys ..."
Abstract
-
Cited by 146 (6 self)
- Add to MetaCart
Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide "sanity bounds" to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.
Logic and databases: a deductive approach
- ACM Computing Surveys
, 1984
"... The purpose of this paper is to show that logic provides a convenient formalism for studying classical database problems. There are two main parts to the paper, devoted respectively to conventional databases and deductive databases. In the first part, we focus on query languages, integrity modeling ..."
Abstract
-
Cited by 130 (2 self)
- Add to MetaCart
The purpose of this paper is to show that logic provides a convenient formalism for studying classical database problems. There are two main parts to the paper, devoted respectively to conventional databases and deductive databases. In the first part, we focus on query languages, integrity modeling and maintenance, query optimization, and data
A linear-time probabilistic counting algorithm for database applications
- ACM Transactions on Database Systems
, 1990
"... We present a probabilistic algorithm for counting the number of unique values in the presence of duplicates. This algorithm has O(q) time complexity, where q is the number of values including duplicates, and produces an estimation with an arbitrary accuracy prespecified by the user using only a smal ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
We present a probabilistic algorithm for counting the number of unique values in the presence of duplicates. This algorithm has O(q) time complexity, where q is the number of values including duplicates, and produces an estimation with an arbitrary accuracy prespecified by the user using only a small amount of space. Traditionally, accurate counts of unique values were obtained by sorting, which has O(q log q) time complexity. Our technique, called linear counting, is based on hashing. We present a comprehensive theoretical and experimental analysis of linear counting. The analysis reveals an interesting result: A load factor (number of unique values/hash table size) much larger than 1.0 (e.g., 12) can be used for accurate estimation (e.g., 1 % of error). We present this technique with two important applications to database problems: namely, (1) obtaining the column cardinality (the number of unique values in a column of a relation) and (2) obtaining the join selectivity (the number of unique values in the join column resulting from an unconditional join divided by the number of unique join column values in the relation to he joined). These two parameters are important statistics that are used in relational query optimization and physical database design.
Optimizing boolean expressions in object bases
- IN PROC. OF THE CONF. ON VERY LARGE DATA BASES (VLDB
, 1992
"... In this paper we address the problem of optimizing the evaluation of boolean expressions in the context of object-oriented data modelling. We develop a new heuristic for optimizing the evaluation sequence of ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
In this paper we address the problem of optimizing the evaluation of boolean expressions in the context of object-oriented data modelling. We develop a new heuristic for optimizing the evaluation sequence of
Absolute Bounds on Set Intersection and Union Sizes From Distribution Information
, 1988
"... d s Estimation of set intersection and union sizes is important for access metho election for a database and other data retrieval problems. Absolute bounds on sizes a are often easier to compute than estimates, requiring no distributional or independence ssumptions, and can answer many of the same n ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
d s Estimation of set intersection and union sizes is important for access metho election for a database and other data retrieval problems. Absolute bounds on sizes a are often easier to compute than estimates, requiring no distributional or independence ssumptions, and can answer many of the same needs. We present a catalog of quick , a closed-form bounds on set intersection and union sizes; they can be expressed as rules nd managed by a rule-based system architecture. These methods use a variety of d statistics precomputed on the data, and exploit homomorphisms (onto mappings) of the ata items onto distributions that can be more easily analyzed. The methods can be t used anytime, but tend to work best when there are strong or complex correlations in he data. This circumstance is poorly handled by the standard independence1 assumption and distributional-assumption estimates, and hence our methods fill a need. . Why bounds? Good estimation of the sizes of set intersections and union...
Estimating Block Accesses when Attributes are Correlated
, 1986
"... database systems fallaciously assume that attributes are independent. This assumption leads such systems to systematically overestimate’the costs of queries and thus to select execution strategies that substantially increase the que.ries ’ processing time. In this paper we show how the concepts of S ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
database systems fallaciously assume that attributes are independent. This assumption leads such systems to systematically overestimate’the costs of queries and thus to select execution strategies that substantially increase the que.ries ’ processing time. In this paper we show how the concepts of Schur concavity and majorizahon can be used to elficiently estimate.the cost of a query when the queried attribute is correlated with the clustering attribute. We will also examine how a block access distribution can be constructed when attributes are correlated in this manner.

