## Selecting Stars: The k Most Representative Skyline Operator (2007)

### Cached

### Download Links

- [www.cse.unsw.edu.au]
- [www.cse.unsw.edu.au]
- [www.cse.unsw.edu.au]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. of the Int. IEEE Conf. on Data Engineering (ICDE |

Citations: | 57 - 2 self |

### BibTeX

@INPROCEEDINGS{Lin07selectingstars:,

author = {Xuemin Lin and Yidong Yuan and Qing Zhang and Ying Zhang},

title = {Selecting Stars: The k Most Representative Skyline Operator},

booktitle = {In Proc. of the Int. IEEE Conf. on Data Engineering (ICDE},

year = {2007}

}

### OpenURL

### Abstract

Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2d-space. Then, we show that the problem is NP-hard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1 − 1 e. To speed-up the computation, an efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable. 1.

### Citations

2221 | R-Trees: A Dynamic Index Structure for Spatial Searching
- Guttman
- 1984
(Show Context)
Citation Context ...S; 5: return S; Lemma 2. [16]: Algorithm 1 returns an approximate solution to top-k RSP with the approximate ratio 1 − 1 2 e . In our implementation, we assume that the dataset P is indexed by R-tree =-=[2, 14]-=-. We use BBS to compute SP in 2 Here, e is Euler’s constant rather than an entry in an R-tree. Step 1. Then, for each p ∈ P we compute {D({s}) : ∀s ∈ SP } by a window query per data point p ∈ P − SP a... |

1886 |
An Introduction to Probability Theory and its Applications
- Feller
- 1968
(Show Context)
Citation Context ...d to Fmi is defined in the same way as min(B) related to B. As shown in [12], E(min(Fmi)) = E(min(Fmj)) (1 ≤ i < j ≤ Ϝ); this, together with Theorem 2 in [12] and the Central Limit Theorem (pp 229 in =-=[11]-=-), immediately leads the following theorem by the independence assumption. Theorem 1. Let n be the number of distinct elements in P and A be the estimation of FM algorithm as shown in (1). For a given... |

981 | The r*-tree: an efficient and robust access method for points and rectangles
- Beckmann, Kriegel, et al.
- 1990
(Show Context)
Citation Context ... accurate, and scalable. 1. Introduction Given a set of d-dimensional points, the skyline consists of the points, called “skyline points”, which are not dominated by another point. A point p = (p[1],p=-=[2]-=-,...,p[d]) dominates another point q = (q[1],q[2],...,q[d]) iff p[i] ≤ q[i] (for 1 ≤ i ≤ d) and there is at least one dimension j such that p[j] < q[j]. The skyline computation (or the skyline operato... |

484 | Nearest neighbor queries
- Roussopoulos, Kelley, et al.
- 1995
(Show Context)
Citation Context ...whole dataset. Two auxiliary data structures are proposed, bitmap and search tree. Kossmann et al. [22] present another progressive technique based on the nearest neighbour search technique on R-tree =-=[32, 15]-=-, which adopts a divide-and-conquer paradigm on the dataset indexed by R-tree. Papadias et al. [29] propose a branch and bound search technique (BBS) to progressively output skyline points on datasets... |

389 | The skyline operator
- Börzsönyi, Kossmann, et al.
- 2001
(Show Context)
Citation Context ...ecently received a great deal of attention in the database community. A number of efficient algorithms for computing all skyline points (i.e. full skyline) dist 1 have been reported in the literature =-=[4, 9, 13, 22, 29, 33]-=-. It has been shown in [3, 13] that the expected number of skyline points is Θ(ln d−1 n/(d − 1)!) for a random dataset. With the presence of a possibly large number of skyline points, the full skyline... |

340 | Probabilistic counting algorithms for data base applications
- Flajolet, Martin
- 1985
(Show Context)
Citation Context ... be scalable nor efficient. We develop a novel, efficient, scalable, indexbased randomized algorithm, with a theoretical accuracy guarantee, by using a probabilistic counting technique – FM algorithm =-=[12]-=-. Besides theoretical analysis, an extensive experimental evaluation demonstrates that our randomized algorithm is both time- and space- efficient, as well as highly accurate. The rest of the paper is... |

323 | Efficient processing of spatial joins using r-trees
- Brinkhoff, Kriegel, et al.
- 1993
(Show Context)
Citation Context ...8 9 10 Figure 7. Update fm dominate e ′ . For example, in Figure 6 e4 fully dominates e1, partially dominates e2, and does not dominate e3. We treat UpdateSketch (e H2 ,S) as one kind of spatial join =-=[5, 17]-=-. We apply the R-tree traversal paradigms from [5, 17, 28]. To avoid reading a data entry e from disk more than once, we group the entries, at which FM sketch sets may need to be updated by the points... |

293 | Distance Browsing in Spatial Databases
- Hjaltason, Samet
- 1999
(Show Context)
Citation Context ...whole dataset. Two auxiliary data structures are proposed, bitmap and search tree. Kossmann et al. [22] present another progressive technique based on the nearest neighbour search technique on R-tree =-=[32, 15]-=-, which adopts a divide-and-conquer paradigm on the dataset indexed by R-tree. Papadias et al. [29] propose a branch and bound search technique (BBS) to progressively output skyline points on datasets... |

194 | Shooting stars in the sky: An online algorithm for skyline queries
- Kossmann, Ramsak, et al.
- 2002
(Show Context)
Citation Context ...ecently received a great deal of attention in the database community. A number of efficient algorithms for computing all skyline points (i.e. full skyline) dist 1 have been reported in the literature =-=[4, 9, 13, 22, 29, 33]-=-. It has been shown in [3, 13] that the expected number of skyline points is Θ(ln d−1 n/(d − 1)!) for a random dataset. With the presence of a possibly large number of skyline points, the full skyline... |

176 | On finding the maxima of a set of vectors
- Kung, Luccio, et al.
- 1975
(Show Context)
Citation Context ...s the solution. In this paper, we study the problem of efficiently computing top-k RSP. 2.2. Related work Computing full skyline. Efficiently computing skyline is first investigated by Kung et al. in =-=[23]-=-. Bentley et al. [3] provide an efficient algorithm with an expected linear running time if the data distribution on each dimension is independent. Börzsönyi et al. [4] first investigate the skyline c... |

161 | An optimal and progressive algorithm for skyline queries
- Papadias, Tao, et al.
- 2003
(Show Context)
Citation Context ...ecently received a great deal of attention in the database community. A number of efficient algorithms for computing all skyline points (i.e. full skyline) dist 1 have been reported in the literature =-=[4, 9, 13, 22, 29, 33]-=-. It has been shown in [3, 13] that the expected number of skyline points is Θ(ln d−1 n/(d − 1)!) for a random dataset. With the presence of a possibly large number of skyline points, the full skyline... |

139 |
Efficient progressive skyline computation
- Tan, Eng, et al.
- 2001
(Show Context)
Citation Context |

137 | Optimal histograms with quality guarantees
- JAGADISH, KOUDAS, et al.
- 1998
(Show Context)
Citation Context ...the exact solution of top-k RSP with respect to the whole set of skyline points. Clearly, OPT(k) = max 1≤i≤m {opt(si,k)} (3) Based on formulae (2) and (3), the V-optimal dynamic programming technique =-=[19]-=- can be immediately used to solve our top-k RSP. The algorithm runs in O(km2 ) time if each |∆(si,sj)| (for 1 ≤ j < i ≤ m) is pre-computed and the skyline points are also pre-computed. In the followin... |

120 |
Approximation algorithms for the set covering and vertex cover problems
- Hochbaum
- 1982
(Show Context)
Citation Context ...ts in its corresponding cells. Therefore, the problem of computing top-k RSP is also NP-hard. 5.2. Greedy Algorithm In fact, top-k RSP can be immediately transformed into the maximum coverage problem =-=[16]-=-; consequently it can be solved approximately by a greedy heuristic. Below in Algorithm 1, we present the greedy heuristic. Note that D(S) is the set of points each of which is dominated by at least o... |

112 | Skyline with presorting
- Chomicki, Godfrey, et al.
- 2003
(Show Context)
Citation Context |

93 | Efficient distributed skylining for web information systems
- Balke, Güntzer, et al.
- 2004
(Show Context)
Citation Context ...ighly accurate, and scalable. 1. Introduction Given a set of d-dimensional points, the skyline consists of the points, called “skyline points”, which are not dominated by another point. A point p = (p=-=[1]-=-,p[2],...,p[d]) dominates another point q = (q[1],q[2],...,q[d]) iff p[i] ≤ q[i] (for 1 ≤ i ≤ d) and there is at least one dimension j such that p[j] < q[j]. The skyline computation (or the skyline op... |

88 | Spatial joins using R-trees: Breadth-first traversal with global optimizations
- Huang, Jing, et al.
- 1997
(Show Context)
Citation Context ...8 9 10 Figure 7. Update fm dominate e ′ . For example, in Figure 6 e4 fully dominates e1, partially dominates e2, and does not dominate e3. We treat UpdateSketch (e H2 ,S) as one kind of spatial join =-=[5, 17]-=-. We apply the R-tree traversal paradigms from [5, 17, 28]. To avoid reading a data entry e from disk more than once, we group the entries, at which FM sketch sets may need to be updated by the points... |

86 | On the average number of maxima in a set of vectors and applications
- Bentley, Kung, et al.
- 1978
(Show Context)
Citation Context ... the database community. A number of efficient algorithms for computing all skyline points (i.e. full skyline) dist 1 have been reported in the literature [4, 9, 13, 22, 29, 33]. It has been shown in =-=[3, 13]-=- that the expected number of skyline points is Θ(ln d−1 n/(d − 1)!) for a random dataset. With the presence of a possibly large number of skyline points, the full skyline may be less informative. In t... |

68 | Catching the best views of skyline: A semantic approach based on decisive subspaces
- Pei, Jin, et al.
- 2005
(Show Context)
Citation Context ...ing variations of skyline computation. These include computing skyline in a distributed environment [1, 18], continuously processing skyline queries in data streams [26, 34], skyline cube computation =-=[30, 37]-=- and its dynamic maintenance [36], computing skyline efficiently in a subspace [35], effectively materializing dominance relationships [24], and multi-source skyline query processing [10]. 3. Prelimin... |

65 | Stabbing the sky: Efficient skyline computation over sliding windows
- Lin, Yuan, et al.
- 2005
(Show Context)
Citation Context ...ch results in the literature regarding variations of skyline computation. These include computing skyline in a distributed environment [1, 18], continuously processing skyline queries in data streams =-=[26, 34]-=-, skyline cube computation [30, 37] and its dynamic maintenance [36], computing skyline efficiently in a subspace [35], effectively materializing dominance relationships [24], and multi-source skyline... |

59 | Maximal vector computation in large data sets
- Godfrey, Shipley, et al.
- 2005
(Show Context)
Citation Context |

55 | Stratified computation of skylines with partially ordered domains
- CHAN, ENG, et al.
- 2005
(Show Context)
Citation Context ...t guarantees the minimum I/O costs. Kapoor [20] studies the problem of dynamically maintaining an effective data structure for an incremental skyline computation in a 2-dimensional space. Chan et al. =-=[6]-=- investigate the skyline computation problem for partiallyordered value domains. Data points with designated dominance properties. Observe that the number of skyline points may be large; thus the full... |

54 | Z.: Finding k-dominant skyline in high dimensional space
- Chan, Jagadish, et al.
(Show Context)
Citation Context ...tions (hotels). Moreover, the top-k RSP also provides a novel ranking mechanism for top-k queries. Selecting data points with certain designated dominance properties has been recently investigated in =-=[7, 8, 21, 29]-=-. Nevertheless, to the best of our knowledge our top-k RSP problem is novel and it is inherently different than the problems in [7, 8, 21, 29]; consequently these existing techniques are not applicabl... |

49 | Efficient computation of the skyline cube
- YUAN, LIN, et al.
- 2005
(Show Context)
Citation Context ...ing variations of skyline computation. These include computing skyline in a distributed environment [1, 18], continuously processing skyline queries in data streams [26, 34], skyline cube computation =-=[30, 37]-=- and its dynamic maintenance [36], computing skyline efficiently in a subspace [35], effectively materializing dominance relationships [24], and multi-source skyline query processing [10]. 3. Prelimin... |

41 | Maintaining sliding window skylines on data streams
- Tao, Papadias
(Show Context)
Citation Context ...ch results in the literature regarding variations of skyline computation. These include computing skyline in a distributed environment [1, 18], continuously processing skyline queries in data streams =-=[26, 34]-=-, skyline cube computation [30, 37] and its dynamic maintenance [36], computing skyline efficiently in a subspace [35], effectively materializing dominance relationships [24], and multi-source skyline... |

37 | Skyline queries against mobile lightweight devices in MANETs
- HUANG, JENSEN, et al.
- 2006
(Show Context)
Citation Context ... RSP. Other Related Work. There also have been a number of research results in the literature regarding variations of skyline computation. These include computing skyline in a distributed environment =-=[1, 18]-=-, continuously processing skyline queries in data streams [26, 34], skyline cube computation [30, 37] and its dynamic maintenance [36], computing skyline efficiently in a subspace [35], effectively ma... |

36 | Z.: On high dimensional skylines
- Chan, Jagadish, et al.
(Show Context)
Citation Context ...tions (hotels). Moreover, the top-k RSP also provides a novel ranking mechanism for top-k queries. Selecting data points with certain designated dominance properties has been recently investigated in =-=[7, 8, 21, 29]-=-. Nevertheless, to the best of our knowledge our top-k RSP problem is novel and it is inherently different than the problems in [7, 8, 21, 29]; consequently these existing techniques are not applicabl... |

33 | Approximately dominating representatives
- Koltun, Papadimitriou
(Show Context)
Citation Context ...tions (hotels). Moreover, the top-k RSP also provides a novel ranking mechanism for top-k queries. Selecting data points with certain designated dominance properties has been recently investigated in =-=[7, 8, 21, 29]-=-. Nevertheless, to the best of our knowledge our top-k RSP problem is novel and it is inherently different than the problems in [7, 8, 21, 29]; consequently these existing techniques are not applicabl... |

33 | Refreshing the sky: The compressed skycube with efficient support for frequent updates
- Xia, Zhang
- 2006
(Show Context)
Citation Context .... These include computing skyline in a distributed environment [1, 18], continuously processing skyline queries in data streams [26, 34], skyline cube computation [30, 37] and its dynamic maintenance =-=[36]-=-, computing skyline efficiently in a subspace [35], effectively materializing dominance relationships [24], and multi-source skyline query processing [10]. 3. Preliminaries We present briefly the skyl... |

32 | Subsky: Efficient computation of skylines in subspaces
- Tao, Xiao, et al.
- 2006
(Show Context)
Citation Context ... environment [1, 18], continuously processing skyline queries in data streams [26, 34], skyline cube computation [30, 37] and its dynamic maintenance [36], computing skyline efficiently in a subspace =-=[35]-=-, effectively materializing dominance relationships [24], and multi-source skyline query processing [10]. 3. Preliminaries We present briefly the skyline computation algorithm, BBS [29], as well as a ... |

31 | Processing and optimization of multiway spatial joins using R-trees
- Papadias, Mamoulis, et al.
- 1999
(Show Context)
Citation Context ... in Figure 6 e4 fully dominates e1, partially dominates e2, and does not dominate e3. We treat UpdateSketch (e H2 ,S) as one kind of spatial join [5, 17]. We apply the R-tree traversal paradigms from =-=[5, 17, 28]-=-. To avoid reading a data entry e from disk more than once, we group the entries, at which FM sketch sets may need to be updated by the points or child entries of e, of in-memory R FM -tree for skylin... |

30 | DADA: A Data Cube for Dominant Relationship Analysis
- Li, Ooi, et al.
- 2006
(Show Context)
Citation Context ...eries in data streams [26, 34], skyline cube computation [30, 37] and its dynamic maintenance [36], computing skyline efficiently in a subspace [35], effectively materializing dominance relationships =-=[24]-=-, and multi-source skyline query processing [10]. 3. Preliminaries We present briefly the skyline computation algorithm, BBS [29], as well as a probabilistic algorithm, FM [12], for counting distinct ... |

16 |
Multi-source skyline query processing in road networks
- Deng, Zhou, et al.
- 2007
(Show Context)
Citation Context ...putation [30, 37] and its dynamic maintenance [36], computing skyline efficiently in a subspace [35], effectively materializing dominance relationships [24], and multi-source skyline query processing =-=[10]-=-. 3. Preliminaries We present briefly the skyline computation algorithm, BBS [29], as well as a probabilistic algorithm, FM [12], for counting distinct data elements. They will be employed in our appr... |

13 |
Dynamic maintenance of maximas of 2-d Point Sets
- Kapoor
- 1994
(Show Context)
Citation Context ...nd search technique (BBS) to progressively output skyline points on datasets indexed by R-tree. One of the most important properties of BBS in [29] is that it guarantees the minimum I/O costs. Kapoor =-=[20]-=- studies the problem of dynamically maintaining an effective data structure for an incremental skyline computation in a 2-dimensional space. Chan et al. [6] investigate the skyline computation problem... |

6 |
A.: Introduction to Spatial Databases: Applications to GIS
- Rigaux, Scholl, et al.
- 2000
(Show Context)
Citation Context .... 4. Two-dimensional Space We investigate the problem of the top-k RSP in a 2dspace. We first show the problem can be solved by a dynamic programming algorithm. Then, we develop a sweepline technique =-=[31]-=- to efficiently compute the parameters needed in the dynamic programming algorithm.s4.1. Dynamic Programming Based Algorithm Suppose that {s1,s2,...,sm} is a collection of skyline points in a 2d-space... |

1 | Summarizing level-two topological relations in large spatial datasets
- Lin, Liu, et al.
(Show Context)
Citation Context ... shaded area in Figure 4(a)) of the plane x + y + z = 1 into grid cells (see Figure 4(b)) such that each grid cell has at most 3 data points. According to the proofs of Theorem 3.3 and Theorem 4.1 in =-=[25]-=-, the problem of finding k grid cells to contain the maximum number of points is NP-hard regarding this setting. For a cell containing at least one data point, we choose 3 grid points including all da... |