Results 1 -
7 of
7
A Query Sampling Method for Estimating Local Cost Parameters in a Multidatabase System
- IN IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING
, 1994
"... In a multidatabase system (MDBS), some query optimization information related to local database systems may not be available at the global level because of local autonomy. To perform global query optimization, a method is required to derive the necessary local information. This paper presents a new ..."
Abstract
-
Cited by 31 (8 self)
- Add to MetaCart
In a multidatabase system (MDBS), some query optimization information related to local database systems may not be available at the global level because of local autonomy. To perform global query optimization, a method is required to derive the necessary local information. This paper presents a new method that employs a query sampling technique to estimate the cost parameters of an autonomous local database system. We introduce a classification for grouping local queries and suggest a cost estimation formula for the queries in each class. We present a procedure to draw a sample of queries from each class and use the observed costs of sample queries to determine the cost parameters by multiple regression. Experimental results indicate that the method is quite promising for estimating the cost of local queries in an MDBS.
Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems
- Distributed and Parallel Databases
, 1998
"... . To meet users' growing needs for accessing pre-existing heterogeneous databases, a multidatabase system (MDBS) integrating multiple databases has attracted many researchers recently. A key feature of an MDBS is local autonomy. For a query retrieving data from multiple databases, global query optim ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
. To meet users' growing needs for accessing pre-existing heterogeneous databases, a multidatabase system (MDBS) integrating multiple databases has attracted many researchers recently. A key feature of an MDBS is local autonomy. For a query retrieving data from multiple databases, global query optimization should be performed to achieve good system performance. There are a number of new challenges for global query optimization in an MDBS. Among them, a major one is that some local optimization information, such as local cost parameters, may not be available at the global level because of local autonomy. It creates difficulties for finding a good decomposition of a global query during query optimization. To tackle this challenge, a new query sampling method is proposed in this paper. The idea is to group component queries into homogeneous classes, draw a sample of queries from each class, and use observed costs of sample queries to derive a cost formula for each class by multiple regres...
The CORDS Multidatabase Project
, 1995
"... In virtually every organization, data is stored in a variety of ways and managed by different database and file systems. Applications that require data from multiple sources are complex because they must be aware of and deal with the specifics of each data source. They must also perform any data int ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
In virtually every organization, data is stored in a variety of ways and managed by different database and file systems. Applications that require data from multiple sources are complex because they must be aware of and deal with the specifics of each data source. They must also perform any data integration needed, for example, joining data from multiple sources. The objective of a multidatabase system is to provide application developers and end users with an integrated view of and a uniform interface to all the required data. The view and the interface should be independent of where the data is stored and how it is managed. cords is a research project focussed on distributed applications. It is a collaborative effort involving ibm and several universities. As part of this project, we are designing and prototyping a multidatabase system. This paper provides an overview of its architecture and describes the approach taken in the following areas: management of catalog information, sch...
A Piggyback Method to Collect Statistics for Query Optimization in Database Management Systems
, 1998
"... A database management system (DBMS) usually performs query optimization based on statistical information about data in the underlying database. Out-of-date statistics may lead to inefficient query processing in the system. Existing solutions to this problem have some drawbacks such as heavy administ ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
A database management system (DBMS) usually performs query optimization based on statistical information about data in the underlying database. Out-of-date statistics may lead to inefficient query processing in the system. Existing solutions to this problem have some drawbacks such as heavy administrative burden, high system load, and tardy updates. To overcome these drawbacks, our new approach, called the piggyback method, is proposed in this paper. The key idea is to piggyback some additional retrievals during the processing of a user query in order to collect more up-to-date statistics. The collected statistics are used to optimize the processing of subsequent queries. To specify the piggybacked queries, basic piggybacking operators are defined in this paper. Using the operators, several types of piggybacking such as vertical, horizontal, mixed vertical and horizontal, and multi-query piggybacking are introduced. Statistics that can be obtained from different access methods...
Selectivity Estimation for Joins Using Systematic Sampling
- IN PROCEEDINGS OF INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEM APPLICATIONS
, 1997
"... We propose a new approach to the estimation of join selectivity. The technique, which we have called "systematic sampling", is a novel variant of the sampling-based approach. Systematic sampling works as follows: Given a relation R of N tuples, with a join attribute that can be accessed in ascending ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We propose a new approach to the estimation of join selectivity. The technique, which we have called "systematic sampling", is a novel variant of the sampling-based approach. Systematic sampling works as follows: Given a relation R of N tuples, with a join attribute that can be accessed in ascending/descending order via an index, if n is the number of tuples to be sampled from R, select a tuple at random from the first k = d N n e tuples of R and every kth tuple thereafter. We first develop a theoretical foundation for systematic sampling which suggests that the method gives a more representative sample than the traditional simple random sampling. Subsequent experimental analysis on a range of synthetic relations confirms that the quality of sample relations (participating in a join) yielded by systematic sampling is higher than those produced by the traditional simple random sampling. To ensure that the sample relations produced by the systematic sampling indeed assist in computat...
Piggyback statistics collection for query optimization: Towards a self-maintaining database management system
- Computer Journal
, 2004
"... A database management system (DBMS) performs query optimization based on statistical information about data in the underlying database. Out-of-date statistics may lead to inefficient query processing in the system. The existing utility method, which collects statistics in batch mode, suffers from dr ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A database management system (DBMS) performs query optimization based on statistical information about data in the underlying database. Out-of-date statistics may lead to inefficient query processing in the system. The existing utility method, which collects statistics in batch mode, suffers from drawbacks such as heavy administrative burden, high system load and tardy updates. In this paper, we study approaches to performing statistical analysis on the fly during query execution, taking advantage of data already resident in main memory. We propose a framework for on-the-fly statistics collection, which we term piggybacking, and analyze the tradeoffs of piggybacking various statistics collection techniques on top of query execution plans. We present a multiple-granularity interleaving algorithm to integrate a set of piggyback operations with an execution plan, and show how the algorithm can be incorporated into an existing query optimizer. Our experiments demonstrate that useful statistics can be obtained via the piggyback method with a small overhead. 1.
Query Result Size Estimation Techniques in Database Systems
, 1998
"... Query optimisers are critical to the efficiency of modern relational database systems. If a query optimiser chooses a poor query execution plan, the performance of the database system in answering the query can be very poor. In fact, the differences in cost between the least and most expensive query ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Query optimisers are critical to the efficiency of modern relational database systems. If a query optimiser chooses a poor query execution plan, the performance of the database system in answering the query can be very poor. In fact, the differences in cost between the least and most expensive query execution plans can be several orders of magnitude. On the other hand, it can be prohibitively expensive for the query optimiser to search exhaustively for the least-cost (strictly optimal) query execution plan. Most query optimisers, therefore, compromise by using a reasonably cheap search to obtain a reasonably cheap query execution plan. Accurate, but inexpensive, query size estimation is fundamental to the success of real query optimisers. A number of studies [Christodoulakis 1984; Ioannidis and Christodoulakis 1991, 1993] have demonstrated that optimisers can select very expensive query execution plans if they are forced to rely on poor or inaccurate query size estimates. This thesis will address the problem of how to obtain reliable and accurate query size estimation for the cost calculation of query execution plans.

