## A Query Sampling Method for Estimating Local Cost Parameters in a Multidatabase System (1994)

Venue: | IN IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING |

Citations: | 31 - 8 self |

### BibTeX

@INPROCEEDINGS{Zhu94aquery,

author = {Qiang Zhu and Per-Åke Larson},

title = {A Query Sampling Method for Estimating Local Cost Parameters in a Multidatabase System},

booktitle = {IN IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING},

year = {1994},

pages = {144--153},

publisher = {}

}

### OpenURL

### Abstract

In a multidatabase system (MDBS), some query optimization information related to local database systems may not be available at the global level because of local autonomy. To perform global query optimization, a method is required to derive the necessary local information. This paper presents a new method that employs a query sampling technique to estimate the cost parameters of an autonomous local database system. We introduce a classification for grouping local queries and suggest a cost estimation formula for the queries in each class. We present a procedure to draw a sample of queries from each class and use the observed costs of sample queries to determine the cost parameters by multiple regression. Experimental results indicate that the method is quite promising for estimating the cost of local queries in an MDBS.

### Citations

184 |
Equi-depth histograms for estimating selectivity factors for multi-dimensional queries
- Muralikrishna, DeWitt
- 1988
(Show Context)
Citation Context ...re, but all of them perform data sampling (i.e., sample data from underlying databases) instead of query sampling (i.e., sample queries from a query class). Muralikrishna and Piatetsky-Shapiro et al. =-=[6; 9]-=- discussed using data sampling to build approximate selectivity histograms. Hou and Lipton et al. [3; 4] investigated several different data sampling techniques, e.g., simple sampling, adaptive sampli... |

154 | Practical selectivity estimation through adaptive sampling
- Lipton, Naughton, et al.
- 1990
(Show Context)
Citation Context ... sampling (i.e., sample queries from a query class). Muralikrishna and Piatetsky-Shapiro et al. [6; 9] discussed using data sampling to build approximate selectivity histograms. Hou and Lipton et al. =-=[3; 4]-=- investigated several different data sampling techniques, e.g., simple sampling, adaptive sampling and double sampling, to estimate the size of a query result. Olken et al. [7] considered the problem ... |

76 | Query optimization in heterogeneous DBMS
- Du, Krishnamurthy, et al.
- 1992
(Show Context)
Citation Context ...r can make use of both global and local information to produce a good execution plan for a given query. In an MDBS, methods to derive or estimate local query optimization information are required. In =-=[2]-=-, Du et al. proposed a calibration method to deduce necessary local information. The idea is to construct a local synthetic calibrating database with special properties and to run a set of queries aga... |

56 |
Simple random sampling from relational dat abases
- Olken, Rotem
- 1996
(Show Context)
Citation Context ...ou and Lipton et al. [3; 4] investigated several different data sampling techniques, e.g., simple sampling, adaptive sampling and double sampling, to estimate the size of a query result. Olken et al. =-=[7]-=- considered the problem of constructing a random subset of a query result without computing the full result. All their work is about performing a given query against a sample of data and deriving prop... |

30 |
Estimating record selectivities
- CHRISTODOULAKIS
- 1983
(Show Context)
Citation Context ...) and the selectivity of the qualification of a given query. The number of tuples usually can be found in the catalog of a local database system. Selectivities can be estimated by a parametric method =-=[1]-=- , a table-based method [6; 9] or datasampling -based methods [4] . However, new issues need to be solved in an MDBS environment, for example, how to draw sample data from local tables efficiently und... |

14 |
Query optimization in multidatabase systems
- Zhu
- 1992
(Show Context)
Citation Context ...ions. Most differences between a conventional distributed database system (DDBS) and an MDBS are caused by local autonomy. These differences introduce new challenges for query optimization in an MDBS =-=[5; 10; 12]-=- . Among the challenges, a crucial one is that some Research supported by IBM Canada Laboratory and Natural Sciences and Engineering Research Council (NSERC) of Canada local query optimization informa... |

11 |
Accurate estimation of the number of tuples satisfying a condition
- Shapiro, Connel
- 1984
(Show Context)
Citation Context ...re, but all of them perform data sampling (i.e., sample data from underlying databases) instead of query sampling (i.e., sample queries from a query class). Muralikrishna and Piatetsky-Shapiro et al. =-=[6; 9]-=- discussed using data sampling to build approximate selectivity histograms. Hou and Lipton et al. [3; 4] investigated several different data sampling techniques, e.g., simple sampling, adaptive sampli... |

10 | An integrated method for estimating selectivities in a multidatabase system
- Zhu
- 1993
(Show Context)
Citation Context ...costs dashed line: observed costs Figure 4 Costs of Test Queries in G13" on EMPRESS 152 terfaces can be used if a data-sampling-based method is adopted. These issues are addressed in a separate p=-=aper [11]-=- , and an integrated method for estimating selectivities in an MDBS is presented in the paper. 7 Conclusion In this paper we have proposed a method that employs a query sampling technique and multiple... |

9 |
On global query optimization in multidatabase systems
- Lu, Shan
- 1992
(Show Context)
Citation Context ...ions. Most differences between a conventional distributed database system (DDBS) and an MDBS are caused by local autonomy. These differences introduce new challenges for query optimization in an MDBS =-=[5; 10; 12]-=- . Among the challenges, a crucial one is that some Research supported by IBM Canada Laboratory and Natural Sciences and Engineering Research Council (NSERC) of Canada local query optimization informa... |

8 |
Establishing a fuzzy cost model for query optimization in a multidatabase system
- Zhu, Larson
- 1994
(Show Context)
Citation Context ...ions. Most differences between a conventional distributed database system (DDBS) and an MDBS are caused by local autonomy. These differences introduce new challenges for query optimization in an MDBS =-=[5; 10; 12]-=- . Among the challenges, a crucial one is that some Research supported by IBM Canada Laboratory and Natural Sciences and Engineering Research Council (NSERC) of Canada local query optimization informa... |

3 |
et al. Error-constrained COUNT query evaluation in relational databases
- Hou
- 1991
(Show Context)
Citation Context ... sampling (i.e., sample queries from a query class). Muralikrishna and Piatetsky-Shapiro et al. [6; 9] discussed using data sampling to build approximate selectivity histograms. Hou and Lipton et al. =-=[3; 4]-=- investigated several different data sampling techniques, e.g., simple sampling, adaptive sampling and double sampling, to estimate the size of a query result. Olken et al. [7] considered the problem ... |

3 |
Statistical Methods for Business and
- Pfaffenberger, Patterson
- 1987
(Show Context)
Citation Context ... (m) 1k be the number of tuples in the operand table of Q (m) 1k , S (m) 1k be the selectivity of Q (m) 1k , Y (m) 1k be the cost measured by executing Q (m) 1k . Applying the method of least squares =-=[8]-=- , we can derive a system of normal equations (omitted) for (9) that takes Y (m) 1k ; N (m) 1k ; S (m) 1k as inputs and produces the estimates of the coefficients fi0 1k , fi1 1k and fi2 1k . For G 2k... |