Results 11  20
of
553
Catching the best views of skyline: A semantic approach based on decisive subspaces
 In VLDB
, 2005
"... The skyline operator is important for multicriteria decision making applications. Although many recent studies developed efficient methods to compute skyline objects in a specific space, the fundamental problem on the semantics of skylines remains open: Why and in which subspaces is (or is not) an o ..."
Abstract

Cited by 82 (12 self)
 Add to MetaCart
(Show Context)
The skyline operator is important for multicriteria decision making applications. Although many recent studies developed efficient methods to compute skyline objects in a specific space, the fundamental problem on the semantics of skylines remains open: Why and in which subspaces is (or is not) an object in the skyline? Practically, users may also be interested in the skylines in any subspaces. Then, what is the relationship between the skylines in the subspaces and those in the superspaces? How can we effectively analyze the subspace skylines? Can we efficiently compute skylines in various subspaces? In this paper, we investigate the semantics of skylines, propose the subspace skyline analysis, and extend the fullspace skyline computation to subspace skyline computation. We introduce a novel notion of skyline group which essentially is a group of objects that are coincidentally in the skylines of some subspaces. We identify the decisive subspaces that qualify skyline groups in the subspace skylines. The new notions concisely capture the semantics and the structures of skylines in various subspaces. Multidimensional rollup and drilldown analysis is introduced. We also develop
Continuous monitoring of topk queries over sliding windows
 In SIGMOD
, 2006
"... Given a dataset P and a preference function f, atopk query retrieves the k tuples in P with the highest scores according to f. Even though the problem is wellstudied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous longrunning queri ..."
Abstract

Cited by 79 (8 self)
 Add to MetaCart
(Show Context)
Given a dataset P and a preference function f, atopk query retrieves the k tuples in P with the highest scores according to f. Even though the problem is wellstudied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous longrunning queries. This paper studies continuous monitoring of topk queries over a fixedsize window W of the most recent data. The window size can be expressed either in terms of the number of active tuples or time units. We propose a general methodology for topk monitoring that restricts processing to the subdomains of the workspace that influence the result of some query. To cope with high stream rates and provide fast answers in an online fashion, the data in W reside in main memory. The valid records are indexed by a grid structure, which also maintains bookkeeping information. We present two processing techniques: the first one computes the new answer of a query whenever some of the current topk points expire; the second one partially precomputes the future changes in the result, achieving better running time at the expense of slightly higher space requirements. We analyze the performance of both algorithms and evaluate their efficiency through extensive experiments. Finally, we extend the proposed framework to other query types and a different data stream model. 1.
Finding kdominant skylines in high dimensional space
 SIGMOD
"... Given a ddimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are ..."
Abstract

Cited by 73 (9 self)
 Add to MetaCart
(Show Context)
Given a ddimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are useful in many decision making applications. Unfortunately, as the number of dimensions increases, the chance of one point dominating another point is very low. As such, the number of skyline points become too numerous to offer any interesting insights. To find more important and meaningful skyline points in high dimensional space, we propose a new concept, called kdominant skyline which relaxes the idea of dominance to kdominance. A point p is said to kdominate another point q if there are k ( ≤ d) dimensions in which p is better than or equal to q and is better in at least one of these k dimensions. A point that is not kdominated by any other points is in the kdominant skyline. We prove various properties of kdominant skyline. In particular, because kdominant skyline points are not transitive, existing skyline algorithms cannot be adapted for kdominant skyline. We then present several new algorithms for finding kdominant skyline and its variants. Extensive experiments show that our methods can answer different queries on both synthetic and real data sets efficiently.
The spatial skyline queries
 In VLDB
, 2006
"... In this paper, for the first time, we introduce the concept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, each data point has a number of derived spatial attributes each of which is the point’s distance to a query point. An SSQ retrieves those points of ..."
Abstract

Cited by 73 (7 self)
 Add to MetaCart
(Show Context)
In this paper, for the first time, we introduce the concept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, each data point has a number of derived spatial attributes each of which is the point’s distance to a query point. An SSQ retrieves those points of P which are not dominated by any other point in P considering their derived spatial attributes. The main difference with the regular skyline query is that this spatial domination depends on the location of the query points Q. SSQ has application in several domains such as emergency response and online maps. The main intuition and novelty behind our approaches is that we exploit the geometric properties of the SSQ problem space to avoid the exhaustive examination of all the point pairs in P and Q. Consequently, we reduce the complexity of SSQ search from O(P  2 Q) to
SPARK: Topk keyword query in relational databases
 In Proceedings of SIGMOD
, 2007
"... With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IRstyle ranking and query evaluation methods cannot be applied directly. ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
(Show Context)
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IRstyle ranking and query evaluation methods cannot be applied directly. In this paper, we study the effectiveness and the efficiency issues of answering topk keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. Compared with previous approaches, our new ranking method is simple yet effective, and agrees with human perceptions. We also study efficient query processing methods for the new ranking method, and propose algorithms that have minimal accesses to the database. We have conducted extensive experiments on largescale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency. Categories and Subject Descriptors
Stratified computation of skylines with partiallyordered domains
 PROC. OF THE ACM SIGMOD INT'L CONF. ON MANAGEMENT OF DATA
, 2005
"... In this paper, we study the evaluation of skyline queries with partiallyordered attributes. Because such attributes lack a total ordering, traditional indexbased evaluation algorithms (e.g., NN and BBS) that are designed for totallyordered attributes can no longer prune the space as effectively. ..."
Abstract

Cited by 70 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we study the evaluation of skyline queries with partiallyordered attributes. Because such attributes lack a total ordering, traditional indexbased evaluation algorithms (e.g., NN and BBS) that are designed for totallyordered attributes can no longer prune the space as effectively. Our solution is to transform each partiallyordered attribute into a twointeger domain that allows us to exploit indexbased algorithms to compute skyline queries on the transformed space. Based on this framework, we propose three novel algorithms: BBS + is a straightforward adaptation of BBS using the framework, and SDC (Stratification by Dominance Classification) and SDC + are optimized to handle false positives and support progressive evaluation. Both SDC and SDC + exploit a dominance relationship to organize the data into strata. While SDC generates its strata at runtime, SDC + partitions the data into strata offline. We also design two dominance classification strategies (MinPC and MaxPC) to further optimize the performance of SDC and SDC +. We implemented the proposed schemes and evaluated their efficiency. Our results show that our proposed techniques outperform existing approaches by a wide margin, with SDC +MinPC giving the best performance in terms of both response time as well as progressiveness. To the best of our knowledge, this is the first paper to address the problem of skyline query evaluation involving partiallyordered attribute domains.
Efficient Computation of the Skyline Cube
 IN VLDB
, 2005
"... Skyline has been proposed as an important operator for multicriteria decision making, data mining and visualization, and userpreference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible nonempty subsets of a given ..."
Abstract

Cited by 67 (5 self)
 Add to MetaCart
Skyline has been proposed as an important operator for multicriteria decision making, data mining and visualization, and userpreference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible nonempty subsets of a given set of dimensions. While existing skyline computation algorithms can be immediately extended to computing each skyline query independently, such "sharednothing" algorithms are inefficient. We develop several computation sharing strategies based on e#ectively identifying the computation dependencies among multiple related skyline queries. Based on these sharing strategies, two novel algorithms, BottomUp and TopDown algorithms, are proposed to compute Skycube efficiently. Finally, our extensive performance evaluations confirm the effectiveness of the sharing strategies. It is
Semantics of ranking queries for probabilistic data and expected ranks
 In Proc. of ICDE’09
, 2009
"... Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditi ..."
Abstract

Cited by 62 (1 self)
 Add to MetaCart
(Show Context)
Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the topk is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a topk over deterministic data. Specifically, we define a number of fundamental properties, including exactk, containment, uniquerank, valueinvariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the wellfounded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attributelevel and tuplelevel uncertainty. For an uncertain relation of N tuples, the processing cost is O(N log N)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the topk has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach. I.
An efficient and scalable approach to cnn queries in a road network
 In Proc. of VLDB
, 2005
"... A continuous search in a road network retrieves the objects which satisfy a query condition at any point on a path. For example, return the three nearest restaurants from all locations on my route from point s to point e. In this paper, we deal with NN queries as well as continuous NN queries in t ..."
Abstract

Cited by 60 (0 self)
 Add to MetaCart
(Show Context)
A continuous search in a road network retrieves the objects which satisfy a query condition at any point on a path. For example, return the three nearest restaurants from all locations on my route from point s to point e. In this paper, we deal with NN queries as well as continuous NN queries in the context of moving objects databases. The performance of existing approaches based on the network distance such as the shortest path length depends largely on the density of objects of interest. To overcome this problem, we propose UNICONS (a unique continuous search algorithm) for NN queries and CNN queries performed on a network. We incorporate the use of precomputed NN lists into Dijkstra’s algorithm for NN queries. A mathematical rationale is employed to produce the final results of CNN queries. Experimental results for reallife datasets of various sizes show that UNICONS outperforms its competitors by up to 3.5 times for NN queries and 5 times for CNN queries depending on the density of objects and the number of NNs required. 1
Efficient Computation of Reverse Skyline Queries
, 2007
"... In this paper, for the first time, we introduce the concept of Reverse Skyline Queries. At first, we consider for a multidimensional data set P the problem of dynamic skyline queries according to a query point q. This kind of dynamic skyline corresponds to the skyline of a transformed data space whe ..."
Abstract

Cited by 59 (0 self)
 Add to MetaCart
In this paper, for the first time, we introduce the concept of Reverse Skyline Queries. At first, we consider for a multidimensional data set P the problem of dynamic skyline queries according to a query point q. This kind of dynamic skyline corresponds to the skyline of a transformed data space where point q becomes the origin and all points of P are represented by their distance vector to q. The reverse skyline query returns the objects whose dynamic skyline contains the query object q. In order to compute the reverse skyline of an arbitrary query point, we first propose a Branch and Bound algorithm (called BBRS), which is an improved customization of the original BBS algorithm. Furthermore, we identify a super set of the reverse skyline that is used to bound the search space while computing the reverse skyline. To further reduce the computational cost of determining if a point belongs to the reverse skyline, we propose an enhanced algorithm (called RSSA) that is based on accurate precomputed approximations of the skylines. These approximations are used to identify whether a point belongs to the reverse skyline or not. Through extensive experiments with both realworld and synthetic datasets, we show that our algorithms can efficiently support reverse skyline queries. Our enhanced approach improves reversed skyline processing by up to an order of magnitude compared to the algorithm without the usage of precomputed approximations.