Results 1  10
of
88
Probabilistic skylines on uncertain data
 In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract

Cited by 66 (17 self)
 Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a pskyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottomup algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The topdown algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Lower bounds for orthogonal range searching: I. the reporting case
 Journal of the ACM
, 1990
"... Abstract. We establish lower bounds on the complexity of orthogonal range reporting in the static case. Given a collection of n points in dspace and a box [a,, b,] x. x [ad, bd], report every point whose ith coordinate lies in [a,, biJ, for each i = 1,..., d. The collection of points is fixed once ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
Abstract. We establish lower bounds on the complexity of orthogonal range reporting in the static case. Given a collection of n points in dspace and a box [a,, b,] x. x [ad, bd], report every point whose ith coordinate lies in [a,, biJ, for each i = 1,..., d. The collection of points is fixed once and for all and can be preprocessed. The box, on the other hand, constitutes a query that must be answered online. It is shown that on a pointer machine a query time of O(k + polylog(n)), where k is the number of points to be reported, can only be achieved at the expense of fl(n(logn/loglogn)d‘) storage. Interestingly, these bounds are optimal in the pointer machine model, but they can be improved (ever so slightly) on a random access machine. In a companion paper, we address the related problem of adding up weights assigned to the points in the query box.
Selecting Stars: The k Most Representative Skyline Operator
 In Proc. of the Int. IEEE Conf. on Data Engineering (ICDE
, 2007
"... Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic progr ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2dspace. Then, we show that the problem is NPhard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1 − 1 e. To speedup the computation, an efficient, scalable, indexbased randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable. 1.
Maximal Vector Computation in Large Data Sets
 IN VLDB
, 2005
"... Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull  and hence, linear optimization  and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyl ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull  and hence, linear optimization  and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyline algorithms that are external and relationally well behaved. The initial
Efficient Computation of the Skyline Cube
 IN VLDB
, 2005
"... Skyline has been proposed as an important operator for multicriteria decision making, data mining and visualization, and userpreference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible nonempty subsets of a given ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
Skyline has been proposed as an important operator for multicriteria decision making, data mining and visualization, and userpreference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible nonempty subsets of a given set of dimensions. While existing skyline computation algorithms can be immediately extended to computing each skyline query independently, such "sharednothing" algorithms are inefficient. We develop several computation sharing strategies based on e#ectively identifying the computation dependencies among multiple related skyline queries. Based on these sharing strategies, two novel algorithms, BottomUp and TopDown algorithms, are proposed to compute Skycube efficiently. Finally, our extensive performance evaluations confirm the effectiveness of the sharing strategies. It is
Multiobjective query processing for database systems
 In International Conference on Very Large Data Bases (VLDB
, 2004
"... Query processing in database systems has developed beyond mere exact matching of attribute values. Scoring database objects and retrieving only the top k matches or Paretooptimal result sets (skyline queries) are already common for a variety of applications. Specialized algorithms using either para ..."
Abstract

Cited by 48 (10 self)
 Add to MetaCart
Query processing in database systems has developed beyond mere exact matching of attribute values. Scoring database objects and retrieving only the top k matches or Paretooptimal result sets (skyline queries) are already common for a variety of applications. Specialized algorithms using either paradigm can avoid naïve linear database scans and thus improve scalability. However, these paradigms are only two extreme cases of exploring viable compromises for each user‘s objectives. To find the correct result set for arbitrary cases of multiobjective query processing in databases we will present a novel algorithm for computing sets of objects that are nondominated with respect to a set of monotonic objective functions. Naturally containing top k and skyline retrieval paradigms as special cases, this algorithm maintains scalability also for all cases in between. Moreover, we will show the algorithm’s correctness and instanceoptimality in terms of necessary object accesses and how the response behavior can be improved by progressively producing result objects as quickly as possible, while the algorithm is still running. 1.
Preference Queries with SVSemantics
, 2005
"... Personalization of database queries requires a semantically rich, easy to handle and flexible preference model. Building on preferences as strict partial orders we provide a variety of intuitive base preference constructors for numerical and categorical data, including socalled dparameters. As a n ..."
Abstract

Cited by 38 (8 self)
 Add to MetaCart
Personalization of database queries requires a semantically rich, easy to handle and flexible preference model. Building on preferences as strict partial orders we provide a variety of intuitive base preference constructors for numerical and categorical data, including socalled dparameters. As a novel semantic concept for complex preferences we introduce the notion of ‘substitutable values’ (SVsemantics), characterizing equally good values amongst indifferent values. Pareto and prioritized preference construction preserves strict partial orders, which instantly solves crucial wellknown problems for preference queries. We can point out a new semanticguided way to cope with the infamous flooding effect of query engines. Contrary to a widespread belief we can give evidence that the result sizes of Pareto or skyline queries not necessarily explode for multiple attributes. Moreover, we can show that known laws from preference relational algebra remain valid under SVsemantics. Since most of these laws rely on transitivity, preservation of strict partial order is essential to algebraically optimize complex preference queries. Similarly, wellknown efficient evaluation algorithms for the preference selection operator rely on transitivity. In a nutshell, preference constructors with SVsemantics enable an intuitive and powerful personalization of database queries and at the same time are the key to efficient preference query evaluation. 1.
Robust cardinality and cost estimation for skyline operator
 In ICDE
, 2006
"... Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to estimate the cardinality and the computational cost of Skyline, and through an empirical comparison, show ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to estimate the cardinality and the computational cost of Skyline, and through an empirical comparison, show that our technique is substantially more effective than traditional approaches. Finally, we show through an implementation in Microsoft SQL Server that skyline queries can substantially benefit from our techniques. 1
Refreshing the sky: the compressed skycube with efficient support for frequent updates
 In SIGMOD
, 2006
"... The skyline query is important in many applications such as multicriteria decision making, data mining, and userpreference queries. Given a set of ddimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
The skyline query is important in many applications such as multicriteria decision making, data mining, and userpreference queries. Given a set of ddimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different dimensions of the data, and issue queries on any subset of d dimensions. This paper focuses on supporting concurrent and unpredictable subspace skyline queries in frequent updated databases. Simply to compute and store the skyline objects of every subspace in a skycube will incur expensive update cost. In this paper, we investigate the important issue of updating the skycube in a dynamic environment. To balance the query cost and update cost, we propose a new structure, the compressed skycube, which concisely represents the complete skycube. We thoroughly explore the properties of the compressed skycube and provide an efficient objectaware update scheme. Experimental results show that the compressed skycube is both query and update efficient. 1.
Rectilinear Full Steiner Tree Generation
 NETWORKS
, 1997
"... The fastest exact algorithm (in practice) for the rectilinear Steiner tree problem in the plane uses a twophase scheme: First a small but sufficient set of full Steiner trees (FSTs) is generated and then a Steiner minimum tree is constructed from this set by using simple backtrack search, dynamic p ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
The fastest exact algorithm (in practice) for the rectilinear Steiner tree problem in the plane uses a twophase scheme: First a small but sufficient set of full Steiner trees (FSTs) is generated and then a Steiner minimum tree is constructed from this set by using simple backtrack search, dynamic programming or an integer programming formulation. FST generation methods can be seen as problem reduction algorithms and are also useful as a first step in providing good upper and lowerbounds for large instances. Currently, the time needed to generate FSTs poses a significant overhead for FST based exact algorithms. In this paper we present a very efficient algorithm for the rectilinear FST generation problem which removes this overhead completely. Based on information obtained in a preprocessing phase, the new algorithm "grows" FSTs while applying several new and important optimality conditions. For randomly generated instances approximately 4n FSTs are generated (where n is the number o...