Results 1 - 10
of
24
Practical Selectivity Estimation through Adaptive Sampling
, 1992
"... Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analys ..."
Abstract
-
Cited by 146 (6 self)
- Add to MetaCart
Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide "sanity bounds" to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.
Adaptive Selectivity Estimation Using Query Feedback
, 1993
"... In this paper, we propose a novel approach for estimating the record selectivities of database queries. The real attribute value distribution is adaptively approximated by a curve-fitting function using a query feedback mechanism. This approach has the advantages of requiring no extra database acces ..."
Abstract
-
Cited by 97 (6 self)
- Add to MetaCart
In this paper, we propose a novel approach for estimating the record selectivities of database queries. The real attribute value distribution is adaptively approximated by a curve-fitting function using a query feedback mechanism. This approach has the advantages of requiring no extra database access overhead for gathering statistics and of being able to continuously adapt the value distribution through queries and updates. Experimental results show that the estimation accuracy of this approach is comparable to traditional methods based on statistics gathering. 1 Introduction In most database systems, the task of query optimization is to choose an efficient execution plan. Best plan selection requires accurate estimates of the costs of alternative plans. One of the most important factors that affects plan cost is selectivity, which is the number of tuples satisfying a given predicate. Therefore, in most cases, the accuracy of selectivity estimates directly affects the choice of best p...
Parallel Query Processing
- ACM Computing Surveys
, 1993
"... With relations growing larger and queries becoming more complex, parallel query processing is an increasingly attractive option for improving the performance of database systems. The objective of this paper is to examine the various issues encountered in parallel query processing and the techniques ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
With relations growing larger and queries becoming more complex, parallel query processing is an increasingly attractive option for improving the performance of database systems. The objective of this paper is to examine the various issues encountered in parallel query processing and the techniques available for addressing these issues. The focus of the paper is on the join operation with both sort-merge join and hash joins being considered. Three types of parallelism can be exploited, namely intra-operator, inter-operator, and inter-query parallelism. In intra-operator parallelism the major issue is task creation, and the objective is to split a join operation into tasks in a manner such that the load can be spread evenly across a given number of processors. This is a challenge when the values on the join attribute are not uniformly distributed. Inter-operator parallelism can be achieved either through parallel execution of independent operations or through pipelining. In either case,...
Multi-dimensional Selectivity Estimation Using Compressed Histogram
- In SIGMOD
, 1999
"... The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes. Additionally, for multimedia databases, there are intrinsic requirements for the multi-dimensional selectivity estimation because feature vectors are stored in multi-dimensional indexing trees. In the 1-dimensional case, a histogram is practically the most preferable. In the multi-dimensional case, however, a histogram is not adequate because of high storage overhead and high error rates. In this paper, we propose a novel approach for the multidimensional selectivity estimation. Compressed information from a large number of small-sized histogram buckets is maintained using the discrete cosine transform. This ena...
A Query Sampling Method for Estimating Local Cost Parameters in a Multidatabase System
- IN IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING
, 1994
"... In a multidatabase system (MDBS), some query optimization information related to local database systems may not be available at the global level because of local autonomy. To perform global query optimization, a method is required to derive the necessary local information. This paper presents a new ..."
Abstract
-
Cited by 31 (8 self)
- Add to MetaCart
In a multidatabase system (MDBS), some query optimization information related to local database systems may not be available at the global level because of local autonomy. To perform global query optimization, a method is required to derive the necessary local information. This paper presents a new method that employs a query sampling technique to estimate the cost parameters of an autonomous local database system. We introduce a classification for grouping local queries and suggest a cost estimation formula for the queries in each class. We present a procedure to draw a sample of queries from each class and use the observed costs of sample queries to determine the cost parameters by multiple regression. Experimental results indicate that the method is quite promising for estimating the cost of local queries in an MDBS.
Dynamic Maintenance of Data Distribution for Selectivity Estimation
- The VLDB Journal
, 1994
"... We propose a new dynamic method for multidimensional selectivity estimation for range queries that works accurately independent of data distribution. Good estimation of selectivity is important for query optimization and physical database design. Our method employs the Multilevel Grid File (MLGF) fo ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
We propose a new dynamic method for multidimensional selectivity estimation for range queries that works accurately independent of data distribution. Good estimation of selectivity is important for query optimization and physical database design. Our method employs the Multilevel Grid File (MLGF) for accurate estimation of multidimensional data distribution. The MLGF is a dynamic hierarchical balanced multidimensional file structure that gracefully adapts to nonuniform and correlated distributions. We show that the MLGF directory naturally represents a multidimensional data distribution. We then extend it for further refinement and present the selectivity estimation method based on the MLGF. Extensive experiments have been performed to test the accuracy of selectivity estimation. The results show that estimation errors are very small independent of distributions even with correlated and/or highly-skewed ones. Finally, we analyze the cause of errors in estimation and investigate the eff...
Time-Constrained Query Processing in CASE-DB
, 1995
"... CASE-DB is a real-time, single-user, relational prototype DBMS that permits the specification of strict time constraints for relational algebra queries. Given a time constrained non-aggregate relational algebra query and a "fragment chain" for each relation involved in the query, CASEDB initially ob ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
CASE-DB is a real-time, single-user, relational prototype DBMS that permits the specification of strict time constraints for relational algebra queries. Given a time constrained non-aggregate relational algebra query and a "fragment chain" for each relation involved in the query, CASEDB initially obtains a response to a modified version of the query and then uses an "iterative query evaluation" technique to successively improve and evaluate the modified version of the query. CASEDB controls the risk of overspending the time quota at each step using a "risk control technique". 1 Introduction A real-time database has strict, real-time timing constraints in responding to queries. A time-constrained query is of the form "evaluate the query Q in at most t time units". In a multi-user, real-time DBMS, the resources (i.e., CPU and data) are shared, and the issue of meeting the time constraint in evaluating the query becomes complicated due to CPU scheduling and transaction management (concu...
Aqua project white paper
, 1997
"... Viswanath Poosala z In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. The goal is to provide an estimated response in orders of magnitude less time than the time to compute an exact answer, by avoiding o ..."
Abstract
-
Cited by 16 (10 self)
- Add to MetaCart
Viswanath Poosala z In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. The goal is to provide an estimated response in orders of magnitude less time than the time to compute an exact answer, by avoiding or minimizing the number of accesses to the base data. This white paper describes the Approximate QUery Answering (AQUA) Project underway in the Information Sciences Research Center at Bell Labs. We present a framework for an approximate query engine that observes new data as it arrives and maintains small synopsis data structures on that data. These data structures are used to provide fast, approximate answers to a broad class of queries. We describe metrics for evaluating approximate query answers. We also present new synopsis data structures, and new techniques for approximate query answers. We report on the goals and status of the Aqua project, and plans for future work.
Optimizing boolean expressions in object bases
- IN PROC. OF THE CONF. ON VERY LARGE DATA BASES (VLDB
, 1992
"... In this paper we address the problem of optimizing the evaluation of boolean expressions in the context of object-oriented data modelling. We develop a new heuristic for optimizing the evaluation sequence of ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
In this paper we address the problem of optimizing the evaluation of boolean expressions in the context of object-oriented data modelling. We develop a new heuristic for optimizing the evaluation sequence of
An Integrated Method for Estimating Selectivities in a Multidatabase System
- In Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research
, 1993
"... A multidatabase system (MDBS) integrates information from autonomous local databases managed by different database management systems (MDBS) in a distributed environment. A number of challenges are raised for query optimization in such an MDBS. One of the major challenges is that some local optimiza ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
A multidatabase system (MDBS) integrates information from autonomous local databases managed by different database management systems (MDBS) in a distributed environment. A number of challenges are raised for query optimization in such an MDBS. One of the major challenges is that some local optimization information may not be available at the global level. We recently proposed a query sampling method to drive cost estimation formulas for local databases in an MDBS [22] . To use the derived formulas to estimate the costs of queries, we need to know the selectivities of the qualifications of the queries. Unfortunately, existing methods for estimating selectivities cannot be used efficiently in an MDBS environment. This paper discusses difficulties of estimating selectivities in an MDBS. Based on the discussion, this paper presents an integrated method to estimate selectivities in an MDBS. The method integrates and extends several existing methods so that they can be used in an MDBS eff...

