Results 1  10
of
54
Probabilistic skylines on uncertain data
 In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract

Cited by 63 (16 self)
 Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a pskyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottomup algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The topdown algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Selecting Stars: The k Most Representative Skyline Operator
 In Proc. of the Int. IEEE Conf. on Data Engineering (ICDE
, 2007
"... Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic progr ..."
Abstract

Cited by 57 (2 self)
 Add to MetaCart
Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2dspace. Then, we show that the problem is NPhard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1 − 1 e. To speedup the computation, an efficient, scalable, indexbased randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable. 1.
Efficient Computation of the Skyline Cube
 IN VLDB
, 2005
"... Skyline has been proposed as an important operator for multicriteria decision making, data mining and visualization, and userpreference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible nonempty subsets of a given ..."
Abstract

Cited by 49 (4 self)
 Add to MetaCart
Skyline has been proposed as an important operator for multicriteria decision making, data mining and visualization, and userpreference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible nonempty subsets of a given set of dimensions. While existing skyline computation algorithms can be immediately extended to computing each skyline query independently, such "sharednothing" algorithms are inefficient. We develop several computation sharing strategies based on e#ectively identifying the computation dependencies among multiple related skyline queries. Based on these sharing strategies, two novel algorithms, BottomUp and TopDown algorithms, are proposed to compute Skycube efficiently. Finally, our extensive performance evaluations confirm the effectiveness of the sharing strategies. It is
On High Dimensional Skylines
 EDBT 2006
, 2006
"... In many decisionmaking applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a highdimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
In many decisionmaking applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a highdimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness of data points based on how often they are returned in the skyline when different number of dimensions (i.e., subspaces) are considered. Intuitively, a point with a high skyline frequency is more interesting as it can be dominated on fewer combinations of the dimensions. Thus, the problem becomes one of finding topk frequent skyline points. But the algorithms thus far proposed for skyline computation typically do not scale well with dimensionality. Moreover, frequent skyline computation requires that skylines be computed for each of an exponential number of subsets of the dimensions. We present efficient approximate algorithms to address these twin difficulties. Our extensive performance study shows that our approximate algorithm can run fast and compute the correct result on large data sets in highdimensional spaces.
SUBSKY: Efficient computation of skylines in subspaces
 In ICDE
, 2006
"... Given a set of multidimensional points, the skyline contains the best points according to any preference function that is monotone on all axes. In practice, applications that require skyline analysis usually provide numerous candidate attributes, and various users depending on their interests may i ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
Given a set of multidimensional points, the skyline contains the best points according to any preference function that is monotone on all axes. In practice, applications that require skyline analysis usually provide numerous candidate attributes, and various users depending on their interests may issue queries regarding different (small) subsets of the dimensions. Formally, given a relation with a large number (e.g.,> 10) of attributes, a query aims at finding the skyline in an arbitrary subspace with a low dimensionality (e.g., 2). The existing algorithms do not support subspace skyline retrieval efficiently because they (i) require scanning the entire database at least once, or (ii) are optimized for one particular subspace but incur significant overhead for other subspaces. In this paper, we propose a technique SUBSKY which settles the problem using a single Btree, and can be implemented in any relational database. The core of SUBSKY is a transformation that converts multidimensional data to 1D values, and enables several effective pruning heuristics. Extensive experiments with real data confirm that SUBSKY outperforms alternative approaches significantly in both efficiency and scalability. 1
Parallelizing skyline queries for scalable distribution
 In EDBT’06
, 2006
"... Abstract. Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this p ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
Abstract. Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this paper, we address the problem of parallelizing skyline query execution over a large number of machines by leveraging contentbased data partitioning. We present a novel distributed skyline query processing algorithm (DSL) that discovers skyline points progressively. We propose two mechanisms, recursive region partitioning and dynamic region encoding, to enforce a partial order on query propagation in order to pipeline query execution. Our analysis shows that DSL is optimal in terms of the total number of local query invocations across all machines. In addition, simulations and measurements of a deployed system show that our system load balances communication and processing costs across cluster machines, providing incremental scalability and significant performance improvement over alternative distribution mechanisms. 1
Efficient skyline query processing on peertopeer networks
 In IEEE International Conference on Data Engineering (ICDE) (2007
, 2007
"... Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peertopeer (P2P) network is still an emerging topic. The desiderata of efficien ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peertopeer (P2P) network is still an emerging topic. The desiderata of efficient skyline querying in P2P environment include: 1) progressive returning of answers, 2) low processing cost in terms of number of peers accessed and search messages, 3) balanced query loads among the peers. In this paper, we propose a solution that satisfies the three desiderata. Our solution is based on a balanced tree structured P2P network. By partitioning the skyline search space adaptively based on query accessing patterns, we are able to alleviate the problem of “hot ” spots present in the skyline query processing. By being able to estimate the peer nodes within the query subspaces, we are able to control the amount of query forwarding, limiting the number of peers involved and the amount of messages transmitted in the network. Load balancing is achieved in query load conscious data space splitting/merging during the joining/departure of nodes and through dynamic load migration. Experiments on real and synthetic datasets confirm the effectiveness and scalability of our algorithm on P2P networks. 1.
Algorithms and Analyses for Maximal Vector Computation
"... The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are exte ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are external and relationally well behaved. While many algorithms have been proposed, how they perform has been unclear. We study the performance of, and design choices behind, these algorithms. We prove runtime bounds based on the number of vectors n and the dimensionality k. Early algorithms based on divideandconquer established seemingly good average and worstcase asymptotic runtimes. In fact, the problem can be solved in O(n) averagecase (holding k as fixed). We prove, however, that the performance is quite bad with respect to k. We demonstrate that the more recent skyline algorithms are better behaved, and can also achieve O(kn) averagecase. While k matters for these, in practice, its effect vanishes in the asymptotic. We introduce a new external algorithm, LESS, that is more efficient and better behaved. We evaluate LESS’s effectiveness and improvement over the field, and prove that its averagecase running time is O(kn). 1
Relaxing join and selection queries
 In VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases
, 2006
"... Database users can be frustrated by having an empty answer to a query. In this paper, we propose a framework to systematically relax queries involving joins and selections. When considering relaxing a query condition, intuitively one seeks the ’minimal ’ amount of relaxation that yields an answer. W ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
Database users can be frustrated by having an empty answer to a query. In this paper, we propose a framework to systematically relax queries involving joins and selections. When considering relaxing a query condition, intuitively one seeks the ’minimal ’ amount of relaxation that yields an answer. We first characterize the types of answers that we return to relaxed queries. We then propose a lattice based framework in order to aid query relaxation. Nodes in the lattice correspond to different ways to relax queries. We characterize the properties of relaxation at each node and present algorithms to compute the corresponding answer. We then discuss how to traverse this lattice in a way that a nonempty query answer is obtained with the minimum amount of query condition relaxation. We implemented this framework and we present our results of a thorough performance evaluation using real and synthetic data. Our results indicate the practical utility of our framework. 1.
Deltasky: Optimal maintenance of skyline deletions without exclusive dominance region generation
 In UCSB Tech Report, 2006. http://www.cs.ucsb.edu/ ∼ pingwu/ deltasky.pdf
, 2007
"... This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. Previous work suggested the use of the so called exclusive dominance region (EDR) to achieve optimal I/O performance for deletion maintenance. However, the shape of an EDR becomes extremely complex in higher dimensions, and algorithms for its computation have not been developed. We derive a systematic way to decompose a ddimensional EDR into a collection of hyperrectangles. We show that the number of such hyperrectangles is O(m d), where m is the current skyline result size. We then propose a novel algorithm DeltaSky which determines whether an intermediate Rtree MBR intersects with the EDR without explicitly calculating the EDR itself. This reduces the worse case complexity of the EDR intersection check from O(m d) to O(md). Thus DeltaSky helps the branch and bound skyline algorithm achieve I/O optimality for deletion maintenance by finding only the newly appeared skyline points after the deletion. We discuss implementation issues and show that DeltaSky can be efficiently implemented using one extra BTree. Moreover, we propose two optimization techniques which further reduce the average cost in practice. Extensive experiments demonstrate that DeltaSky achieves orders of magnitude performance gain over alternative solutions. 1