## Robust cardinality and cost estimation for skyline operator (2006)

Venue: | In ICDE |

Citations: | 33 - 0 self |

### BibTeX

@INPROCEEDINGS{Chaudhuri06robustcardinality,

author = {Surajit Chaudhuri and Nilesh Dalvi and Raghav Kaushik},

title = {Robust cardinality and cost estimation for skyline operator},

booktitle = {In ICDE},

year = {2006},

pages = {64}

}

### Years of Citing Articles

### OpenURL

### Abstract

Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to estimate the cardinality and the computational cost of Skyline, and through an empirical comparison, show that our technique is substantially more effective than traditional approaches. Finally, we show through an implementation in Microsoft SQL Server that skyline queries can substantially benefit from our techniques. 1

### Citations

392 | The Skyline Operator
- Börzsönyi, Kossmann, et al.
- 2001
(Show Context)
Citation Context ...age and low price. The query returns all Toyota cars which are not “dominated”, in that there is no other Toyota car that has a lower mileage and a lower price. The Skyline operator has been proposed =-=[15, 16, 5]-=- to implement preference queries. Skyline takes a set of preferences as input, and returns only those tuples for which there is no other tuple that is better with respect to all preferences. While the... |

199 | Selectivity estimation without the attribute value independence assumption
- Poosala, Ioannidis
- 1997
(Show Context)
Citation Context ...theoretical investigation of this behavior for future work. Histograms Histograms have been extensively studied in the database community for query size estimation. There have been various techniques =-=[9, 13]-=- for using and efficiently constructing multidimensional histograms to model attribute correlations. In the context of Skyline size estimation, we can use these structures to estimate the joint distri... |

195 | Shooting stars in the sky: An online algorithm for skyline queries,” VLDB
- Kossmann, Ramsak, et al.
- 2002
(Show Context)
Citation Context ...o been studied under the name of maximum vector problem [8]. While this work assumed that the whole set of points fit into memory, several algorithms suitable for a database system have been proposed =-=[15, 16, 5, 7, 12]-=-. In this paper, our focus is on algorithms that can be directly implemented in today’s commercial database systems without the addition of new access methods (which would require addressing the assoc... |

176 | On finding the maxima of a set of vectors
- Kung, Luccio, et al.
- 1975
(Show Context)
Citation Context ...7> is not returned as it is dominated by the car <18000,2004>. 2.3 Algorithms for Computing Skyline The problem of computing the Skyline has also been studied under the name of maximum vector problem =-=[8]-=-. While this work assumed that the whole set of points fit into memory, several algorithms suitable for a database system have been proposed [15, 16, 5, 7, 12]. In this paper, our focus is on algorith... |

161 | An optimal and progressive algorithm for skyline queries
- Papadias, Tao, et al.
- 2003
(Show Context)
Citation Context ...o been studied under the name of maximum vector problem [8]. While this work assumed that the whole set of points fit into memory, several algorithms suitable for a database system have been proposed =-=[15, 16, 5, 7, 12]-=-. In this paper, our focus is on algorithms that can be directly implemented in today’s commercial database systems without the addition of new access methods (which would require addressing the assoc... |

154 | Practical selectivity estimation through adaptive sampling
- Lipton, Naughton, et al.
- 1990
(Show Context)
Citation Context ...es are perfectly anti-correlated). We present two techniques to estimate skyline size in the presence of correlations, sampling and histograms. Sampling Sampling techniques for selectivity estimation =-=[11]-=- and query size estimation in general [10] have been known for a long time. A naive method of sampling is as follows: compute the Skyline size on a small random sample of the data and scale it linearl... |

139 |
Efficient progressive skyline computation
- Tan, Eng, et al.
- 2001
(Show Context)
Citation Context ... work [8, 1] on computing the Skyline was algorithmic in nature where all the data was assumed to be available in memory. Several algorithms suitable for a database system have recently been proposed =-=[15, 16, 5, 7]-=-. Some of the algorithms use special index structures likes variants of B-trees [16] or R-trees [7, 12]. Chomicki et al. observed that a simple sorting of data before computing Skyline can substantial... |

112 | Skyline with presorting
- Chomicki, Godfrey, et al.
- 2003
(Show Context)
Citation Context ...age and low price. The query returns all Toyota cars which are not “dominated”, in that there is no other Toyota car that has a lower mileage and a lower price. The Skyline operator has been proposed =-=[15, 16, 5]-=- to implement preference queries. Skyline takes a set of preferences as input, and returns only those tuples for which there is no other tuple that is better with respect to all preferences. While the... |

86 | On the average number of maxima in a set of vectors and applications
- Bentley, Kung, et al.
- 1978
(Show Context)
Citation Context ...l histograms, samples, wavelets, etc. Even under the attribute value independence assumption commonly used in query optimizers, skyline cardinality estimation poses substantial challenges. Prior work =-=[1, 3]-=- has addressed the problem of skyline cardinality estimation but only under strong assumptions in addition to attribute value independence, such as assuming that all attributes are unique and complete... |

86 | G.: Preference SQL - design, implementation, experiences
- Kießling, Köstler
- 2002
(Show Context)
Citation Context ... the window is in Skyline and can be immediately outputted. Also, only buckets need to be stored in the window rather than all data items. 3 Why Implement Skyline in the Relational Engine? Prior work =-=[6]-=- has shown that while skyline does not add to the expressive power of SQL, an implementation of skyline using SQL is expensive, and that a physical implementation that is cognizant of the properties o... |

81 | The Cascades framework for query optimization
- Graefe
- 1995
(Show Context)
Citation Context ...e the interaction of skyline with other operators. Since our implementation is in Microsoft SQL Server 2005 (Beta version), we exploit the extensible query optimizer 4 based on the Cascades framework =-=[4]-=- to integrate the algebraic equivalences as transformation rules. We also make changes to the parser to expose the Preference SQL syntax. But the crucial component that yields the biggest challenge is... |

65 |
Query Size Estimation by Adaptive Sampling
- Lipton, Naughton
- 1990
(Show Context)
Citation Context ...ent two techniques to estimate skyline size in the presence of correlations, sampling and histograms. Sampling Sampling techniques for selectivity estimation [11] and query size estimation in general =-=[10]-=- have been known for a long time. A naive method of sampling is as follows: compute the Skyline size on a small random sample of the data and scale it linearly to the actual data size. This does not w... |

55 | Stratified computation of skylines with partially ordered domains
- CHAN, ENG, et al.
- 2005
(Show Context)
Citation Context ...e problem of estimating the size of Skyline has been considered [1, 3] under the assumption that no two data items have the same value on any attribute and all attributes are independent. Chan et al. =-=[2]-=- show how to evaluate skylines efficiently over partially-ordered domains. Kossman [15] gives some optimizations for Skyline over joins. Kiebling et al. [6] propose an extension of SQL language to inc... |

52 | Multi-dimensional selectivity estimation using compressed histogram information
- Lee, Kim, et al.
- 1999
(Show Context)
Citation Context ...theoretical investigation of this behavior for future work. Histograms Histograms have been extensively studied in the database community for query size estimation. There have been various techniques =-=[9, 13]-=- for using and efficiently constructing multidimensional histograms to model attribute correlations. In the context of Skyline size estimation, we can use these structures to estimate the joint distri... |

26 |
Skyline cardinality for relational processing
- Godfrey
- 2004
(Show Context)
Citation Context ...l histograms, samples, wavelets, etc. Even under the attribute value independence assumption commonly used in query optimizers, skyline cardinality estimation poses substantial challenges. Prior work =-=[1, 3]-=- has addressed the problem of skyline cardinality estimation but only under strong assumptions in addition to attribute value independence, such as assuming that all attributes are unique and complete... |

6 |
Numerical Computation 2: Methods, Software, and Analysis
- Ueberhuber
- 1997
(Show Context)
Citation Context ... us an alternative expression for s(n, d) that is equivalent to the solution of the recurrence in Eq (1). The integral has no closed form, but several numerical methods exist to evaluate the integral =-=[14, 17]-=-. Now suppose we want a Skyline when P = {Y1, · · · Yk, X1, · · · Xd}, where Yi are predicate preferences and Xi are numeric preferences. We assume that Yi is 1 when the predicate is satisfied and 0 o... |