## An Integrated Method for Estimating Selectivities in a Multidatabase System (1993)

Venue: | In Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research |

Citations: | 10 - 5 self |

### BibTeX

@INPROCEEDINGS{Zhu93anintegrated,

author = {Qiang Zhu},

title = {An Integrated Method for Estimating Selectivities in a Multidatabase System},

booktitle = {In Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research},

year = {1993},

pages = {832--847},

publisher = {IBM Press}

}

### OpenURL

### Abstract

A multidatabase system (MDBS) integrates information from autonomous local databases managed by different database management systems (MDBS) in a distributed environment. A number of challenges are raised for query optimization in such an MDBS. One of the major challenges is that some local optimization information may not be available at the global level. We recently proposed a query sampling method to drive cost estimation formulas for local databases in an MDBS [22] . To use the derived formulas to estimate the costs of queries, we need to know the selectivities of the qualifications of the queries. Unfortunately, existing methods for estimating selectivities cannot be used efficiently in an MDBS environment. This paper discusses difficulties of estimating selectivities in an MDBS. Based on the discussion, this paper presents an integrated method to estimate selectivities in an MDBS. The method integrates and extends several existing methods so that they can be used in an MDBS eff...

### Citations

188 |
Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries
- Muralikrishna, DeWitt
- 1988
(Show Context)
Citation Context ... not hold. Therefore, it may not be always feasible in an MDBS because it is sometimes hard to decide which assumptions are suitable for the data in an autonomous local database. A table-based method =-=[15; 19]-=- scans (maybe part of) underlying data periodically to collect necessary statistics in a table and uses the statistics to estimate selectivities. This method makes no assumptions about the underlying ... |

157 | Practical selectivity estimation through adaptive sampling
- Lipton, Naughton, et al.
- 1990
(Show Context)
Citation Context ...amount of detailed statistics about local databases if it were used in an MDBS. This requirement is hard to meet efficiently in a distributed and complicated MDBS environment. A sampling-based method =-=[6; 7; 8; 10; 11]-=- performs a given query on a sample of underlying data and uses the query result to estimate the selectivity for the query. This method makes no assumptions about the underlying data and does not requ... |

98 |
A taxonomy and current issues in multidatabase systems
- Bright, Hurson, et al.
- 1992
(Show Context)
Citation Context ...ty to global users and interacts with the local DBMSs at their external user interfaces. A key feature of an MDBS is the local autonomy that individual databases retain to serve existing applications =-=[2]-=- . Most differences between a conventional distributed database system (DDBS) and an MDBS are caused by local autonomy. These differences raise new challenges for query optimization in an MDBS [13; 21... |

85 |
Sequential sampling procedures for query size estimation
- Haas, Swami
- 1992
(Show Context)
Citation Context ... our MDBS. Two types of sampling-based methods for estimating selectivities are described in the literature. All were designed for (centralized) DBMSs. One type uses the sequential sampling technique =-=[5; 9; 10; 11]-=- . It is characterized by its sequential sample gathering and its stopping condition. That is, sample units are taken one at a time. Checking the outcome of each sample unit allows a decision to be ma... |

77 | Query optimization in heterogeneous DBMS
- Du, Krishnamurthy, et al.
- 1992
(Show Context)
Citation Context ...e caused by local autonomy. These differences raise new challenges for query optimization in an MDBS [13; 21] . However, to date only a few papers have been published on query optimization in an MDBS =-=[4; 12; 13; 21]-=- . Many issues remain unsolved. Among the many challenges for query optimization in an MDBS, the crucial one is that some local query optimization information, for example, local cost functions and so... |

69 |
Query size estimation by adaptive sampling
- Lipton, Naughton
- 1995
(Show Context)
Citation Context ...amount of detailed statistics about local databases if it were used in an MDBS. This requirement is hard to meet efficiently in a distributed and complicated MDBS environment. A sampling-based method =-=[6; 7; 8; 10; 11]-=- performs a given query on a sample of underlying data and uses the query result to estimate the selectivity for the query. This method makes no assumptions about the underlying data and does not requ... |

59 |
et al. Access path selection in a relational database management system
- Selinger
- 1979
(Show Context)
Citation Context ...as estimated as 1=maxfDV (R 1 :a); DV (R 2 :b)g (that is true only if each value in the column with the smaller cardinality has a matching value in the other column) in the formula of Selinger et al. =-=[18]-=- under the uniform assumption and was estimated as a constant value 0.3 (that has little significance) in the formula of Makinouchiset al. [14] , and could not be estimated in Christodoulakis' formula... |

31 | A query sampling method of estimating local cost parameters in a multidatabase system
- Zhu, Larson
- 1993
(Show Context)
Citation Context ...ges is that some local optimization information may not be available at the global level. We recently proposed a query sampling method to drive cost estimation formulas for local databases in an MDBS =-=[22]-=- . To use the derived formulas to estimate the costs of queries, we need to know the selectivities of the qualifications of the queries. Unfortunately, existing methods for estimating selectivities ca... |

30 |
Estimating record selectivities
- CHRISTODOULAKIS
- 1983
(Show Context)
Citation Context ...sampling-based methods. A selectivity, in fact, reflects some properties of the underlying data. To estimate a selectivity, thus, needs some information about the underlying data. A parametric method =-=[3; 14; 17]-=- makes assumptions about the underlying data, for example, uniform distribution and independence of columns, and it then uses certain formulas with parameters to estimate selectivities. This method is... |

24 |
Estimating the size of generalized transitive closures
- Lipton, Naughton
- 1989
(Show Context)
Citation Context ... our MDBS. Two types of sampling-based methods for estimating selectivities are described in the literature. All were designed for (centralized) DBMSs. One type uses the sequential sampling technique =-=[5; 9; 10; 11]-=- . It is characterized by its sequential sample gathering and its stopping condition. That is, sample units are taken one at a time. Checking the outcome of each sample unit allows a decision to be ma... |

24 |
Random sampling from b+ trees
- Olken, Rotem
- 1989
(Show Context)
Citation Context ...y Q. The sampling-based method is obviously not suitable in this case. A sample tuple can be efficiently drawn from a table if there is an indexed dense key column [11]1 or a B + -tree indexed column =-=[11; 16] of the ta-=-ble. This condition cannot be guaranteed in an MDBS. Although we 1 By "dense" we mean that there are no gaps between two consecutive values appearing in the table. will not use a sampling-ba... |

20 | On global multidatabase query optimization - Lu, Ooi, et al. - 1992 |

18 |
Access Path Selection in Distributed Data Base Management Systems
- Selinger, Adiba
- 1980
(Show Context)
Citation Context ...sampling-based methods. A selectivity, in fact, reflects some properties of the underlying data. To estimate a selectivity, thus, needs some information about the underlying data. A parametric method =-=[3; 14; 17]-=- makes assumptions about the underlying data, for example, uniform distribution and independence of columns, and it then uses certain formulas with parameters to estimate selectivities. This method is... |

14 |
Query optimization in multidatabase systems
- Zhu
- 1992
(Show Context)
Citation Context ...ons [2] . Most differences between a conventional distributed database system (DDBS) and an MDBS are caused by local autonomy. These differences raise new challenges for query optimization in an MDBS =-=[13; 21]-=- . However, to date only a few papers have been published on query optimization in an MDBS [4; 12; 13; 21] . Many issues remain unsolved. Among the many challenges for query optimization in an MDBS, t... |

12 |
A supplement to sampling-based methods for query size estimation in a database system
- Ling, Sun
- 1992
(Show Context)
Citation Context ...amount of detailed statistics about local databases if it were used in an MDBS. This requirement is hard to meet efficiently in a distributed and complicated MDBS environment. A sampling-based method =-=[6; 7; 8; 10; 11]-=- performs a given query on a sample of underlying data and uses the query result to estimate the selectivity for the query. This method makes no assumptions about the underlying data and does not requ... |

11 |
Accurate estimation of the number of tuples satisfying a condition
- Shapiro, Connell
- 1984
(Show Context)
Citation Context ... not hold. Therefore, it may not be always feasible in an MDBS because it is sometimes hard to decide which assumptions are suitable for the data in an autonomous local database. A table-based method =-=[15; 19]-=- scans (maybe part of) underlying data periodically to collect necessary statistics in a table and uses the statistics to estimate selectivities. This method makes no assumptions about the underlying ... |

9 |
On global query optimization in multidatabase systems
- Lu, Shan
- 1992
(Show Context)
Citation Context ...ons [2] . Most differences between a conventional distributed database system (DDBS) and an MDBS are caused by local autonomy. These differences raise new challenges for query optimization in an MDBS =-=[13; 21]-=- . However, to date only a few papers have been published on query optimization in an MDBS [4; 12; 13; 21] . Many issues remain unsolved. Among the many challenges for query optimization in an MDBS, t... |

8 | Establishing a fuzzy cost model for query optimization in a multidatabase system - Zhu, Larson - 1994 |

5 |
Adaptive techniques for distributed query optimization
- Yu, Lilien, et al.
- 1986
(Show Context)
Citation Context ...found in several existing DBMSs, while the second method is our new idea. Although modifying cost parameters by using runtime information during the execution of a query can be found in previous work =-=[20]-=- , riding additional side retrievals piggyback on query processing has not been found in the literature. 4 Estimating Selectivities for Join Queries In the last section, we applied a parametric method... |

3 |
et al. Error-constrained COUNT query evaluation in relational databases
- Hou
- 1991
(Show Context)
Citation Context |

2 |
et al., Statistical estimators for relational algebra expressions
- Hou
- 1988
(Show Context)
Citation Context |

1 |
Applied Calculus
- Bittinger, Morrel
- 1984
(Show Context)
Citation Context ...P (1) = e \Gamma0:81 \Gamma e \Gamma0:85 = 0:431013 : 2) P (x) is difficult to derive exactly, but C 2 \Gamma C 1 is small ( (10) is the most likely case). In this case, we apply the Trapezoidal Rule =-=[1]-=- with one trapezoid to numerically approximate the integral in (9), namely, Z C2 C1 p(x)dxs1 2 [p(C 1 ) + p(C 2 )](C 2 \Gamma C 1 ) : (13) For example, for a normal distribution p(x) = 1 p 2 e \Gammax... |

1 |
et al. The optimization strategy for query evaluation in RDB/V1
- Makinouchi
- 1981
(Show Context)
Citation Context ...sampling-based methods. A selectivity, in fact, reflects some properties of the underlying data. To estimate a selectivity, thus, needs some information about the underlying data. A parametric method =-=[3; 14; 17]-=- makes assumptions about the underlying data, for example, uniform distribution and independence of columns, and it then uses certain formulas with parameters to estimate selectivities. This method is... |