Results 1 - 10
of
44
Efficient Computation of the Skyline Cube
- IN VLDB
, 2005
"... Skyline has been proposed as an important operator for multi-criteria decision making, data mining and visualization, and user-preference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible non-empty subsets of a given ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
Skyline has been proposed as an important operator for multi-criteria decision making, data mining and visualization, and user-preference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible non-empty subsets of a given set of dimensions. While existing skyline computation algorithms can be immediately extended to computing each skyline query independently, such "shared-nothing" algorithms are inefficient. We develop several computation sharing strategies based on e#ectively identifying the computation dependencies among multiple related skyline queries. Based on these sharing strategies, two novel algorithms, Bottom-Up and Top-Down algorithms, are proposed to compute Skycube efficiently. Finally, our extensive performance evaluations confirm the effectiveness of the sharing strategies. It is
Probabilistic skylines on uncertain data
- In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract
-
Cited by 39 (10 self)
- Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Selecting Stars: The k Most Representative Skyline Operator
- In Proc. of the Int. IEEE Conf. on Data Engineering (ICDE
, 2007
"... Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic progr ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Skyline computation has many applications including multi-criteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2d-space. Then, we show that the problem is NP-hard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1 − 1 e. To speed-up the computation, an efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable. 1.
On High Dimensional Skylines
- EDBT 2006
, 2006
"... In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness of data points based on how often they are returned in the skyline when different number of dimensions (i.e., subspaces) are considered. Intuitively, a point with a high skyline frequency is more interesting as it can be dominated on fewer combinations of the dimensions. Thus, the problem becomes one of finding top-k frequent skyline points. But the algorithms thus far proposed for skyline computation typically do not scale well with dimensionality. Moreover, frequent skyline computation requires that skylines be computed for each of an exponential number of subsets of the dimensions. We present efficient approximate algorithms to address these twin difficulties. Our extensive performance study shows that our approximate algorithm can run fast and compute the correct result on large data sets in high-dimensional spaces.
Parallelizing skyline queries for scalable distribution
- In EDBT’06
, 2006
"... Abstract. Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this p ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
Abstract. Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this paper, we address the problem of parallelizing skyline query execution over a large number of machines by leveraging content-based data partitioning. We present a novel distributed skyline query processing algorithm (DSL) that discovers skyline points progressively. We propose two mechanisms, recursive region partitioning and dynamic region encoding, to enforce a partial order on query propagation in order to pipeline query execution. Our analysis shows that DSL is optimal in terms of the total number of local query invocations across all machines. In addition, simulations and measurements of a deployed system show that our system load balances communication and processing costs across cluster machines, providing incremental scalability and significant performance improvement over alternative distribution mechanisms. 1
SUBSKY: Efficient computation of skylines in subspaces
- In ICDE
, 2006
"... Given a set of multi-dimensional points, the skyline contains the best points according to any preference function that is monotone on all axes. In practice, applications that require skyline analysis usually provide numerous candidate attributes, and various users depending on their interests may i ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
Given a set of multi-dimensional points, the skyline contains the best points according to any preference function that is monotone on all axes. In practice, applications that require skyline analysis usually provide numerous candidate attributes, and various users depending on their interests may issue queries regarding different (small) subsets of the dimensions. Formally, given a relation with a large number (e.g.,> 10) of attributes, a query aims at finding the skyline in an arbitrary subspace with a low dimensionality (e.g., 2). The existing algorithms do not support subspace skyline retrieval efficiently because they (i) require scanning the entire database at least once, or (ii) are optimized for one particular subspace but incur significant overhead for other subspaces. In this paper, we propose a technique SUBSKY which settles the problem using a single B-tree, and can be implemented in any relational database. The core of SUB-SKY is a transformation that converts multi-dimensional data to 1D values, and enables several effective pruning heuristics. Extensive experiments with real data confirm that SUBSKY outperforms alternative approaches significantly in both efficiency and scalability. 1
Efficient skyline query processing on peer-to-peer networks
- In IEEE International Conference on Data Engineering (ICDE) (2007
, 2007
"... Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peer-to-peer (P2P) network is still an emerging topic. The desiderata of efficien ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peer-to-peer (P2P) network is still an emerging topic. The desiderata of efficient skyline querying in P2P environment include: 1) progressive returning of answers, 2) low processing cost in terms of number of peers accessed and search messages, 3) balanced query loads among the peers. In this paper, we propose a solution that satisfies the three desiderata. Our solution is based on a balanced tree structured P2P network. By partitioning the skyline search space adaptively based on query accessing patterns, we are able to alleviate the problem of “hot ” spots present in the skyline query processing. By being able to estimate the peer nodes within the query subspaces, we are able to control the amount of query forwarding, limiting the number of peers involved and the amount of messages transmitted in the network. Load balancing is achieved in query load conscious data space splitting/merging during the joining/departure of nodes and through dynamic load migration. Experiments on real and synthetic datasets confirm the effectiveness and scalability of our algorithm on P2P networks. 1.
Deltasky: Optimal maintenance of skyline deletions without exclusive dominance region generation
- In UCSB Tech Report, 2006. http://www.cs.ucsb.edu/ ∼ pingwu/ deltasky.pdf
, 2007
"... This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. Previous work suggested the use of the so called exclusive dominance region (EDR) to achieve optimal I/O performance for deletion maintenance. However, the shape of an EDR becomes extremely complex in higher dimensions, and algorithms for its computation have not been developed. We derive a systematic way to decompose a d-dimensional EDR into a collection of hyper-rectangles. We show that the number of such hyper-rectangles is O(m d), where m is the current skyline result size. We then propose a novel algorithm DeltaSky which determines whether an intermediate R-tree MBR intersects with the EDR without explicitly calculating the EDR itself. This reduces the worse case complexity of the EDR intersection check from O(m d) to O(md). Thus DeltaSky helps the branch and bound skyline algorithm achieve I/O optimality for deletion maintenance by finding only the newly appeared skyline points after the deletion. We discuss implementation issues and show that DeltaSky can be efficiently implemented using one extra B-Tree. Moreover, we propose two optimization techniques which further reduce the average cost in practice. Extensive experiments demonstrate that DeltaSky achieves orders of magnitude performance gain over alternative solutions. 1
Towards Multidimensional Subspace Skyline Analysis
"... The skyline operator is important for multicriteria decision-making applications. Although many recent studies developed efficient methods to compute skyline objects in a given space, none of them considers skylines in multiple subspaces simultaneously. More importantly, the fundamental ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
The skyline operator is important for multicriteria decision-making applications. Although many recent studies developed efficient methods to compute skyline objects in a given space, none of them considers skylines in multiple subspaces simultaneously. More importantly, the fundamental
Algorithms and Analyses for Maximal Vector Computation
"... The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are exte ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are external and relationally well behaved. While many algorithms have been proposed, how they perform has been unclear. We study the performance of, and design choices behind, these algorithms. We prove runtime bounds based on the number of vectors n and the dimensionality k. Early algorithms based on divide-and-conquer established seemingly good average and worst-case asymptotic runtimes. In fact, the problem can be solved in O(n) average-case (holding k as fixed). We prove, however, that the performance is quite bad with respect to k. We demonstrate that the more recent skyline algorithms are better behaved, and can also achieve O(kn) averagecase. While k matters for these, in practice, its effect vanishes in the asymptotic. We introduce a new external algorithm, LESS, that is more efficient and better behaved. We evaluate LESS’s effectiveness and improvement over the field, and prove that its average-case running time is O(kn). 1

