Results 11  20
of
426
Continuous monitoring of topk queries over sliding windows
 In SIGMOD
, 2006
"... Given a dataset P and a preference function f, atopk query retrieves the k tuples in P with the highest scores according to f. Even though the problem is wellstudied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous longrunning queri ..."
Abstract

Cited by 63 (7 self)
 Add to MetaCart
Given a dataset P and a preference function f, atopk query retrieves the k tuples in P with the highest scores according to f. Even though the problem is wellstudied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous longrunning queries. This paper studies continuous monitoring of topk queries over a fixedsize window W of the most recent data. The window size can be expressed either in terms of the number of active tuples or time units. We propose a general methodology for topk monitoring that restricts processing to the subdomains of the workspace that influence the result of some query. To cope with high stream rates and provide fast answers in an online fashion, the data in W reside in main memory. The valid records are indexed by a grid structure, which also maintains bookkeeping information. We present two processing techniques: the first one computes the new answer of a query whenever some of the current topk points expire; the second one partially precomputes the future changes in the result, achieving better running time at the expense of slightly higher space requirements. We analyze the performance of both algorithms and evaluate their efficiency through extensive experiments. Finally, we extend the proposed framework to other query types and a different data stream model. 1.
Selecting Stars: The k Most Representative Skyline Operator
 In Proc. of the Int. IEEE Conf. on Data Engineering (ICDE
, 2007
"... Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic progr ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
Skyline computation has many applications including multicriteria decision making. In this paper, we study the problem of selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points, is maximized. We first present an efficient dynamic programming based exact algorithm in a 2dspace. Then, we show that the problem is NPhard when the dimensionality is 3 or more and it can be approximately solved by a polynomial time algorithm with the guaranteed approximation ratio 1 − 1 e. To speedup the computation, an efficient, scalable, indexbased randomized algorithm is developed by applying the FM probabilistic counting technique. A comprehensive performance evaluation demonstrates that our randomized technique is very efficient, highly accurate, and scalable. 1.
SPARK: Topk keyword query in relational databases
 In Proceedings of SIGMOD
, 2007
"... With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IRstyle ranking and query evaluation methods cannot be applied directly. ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IRstyle ranking and query evaluation methods cannot be applied directly. In this paper, we study the effectiveness and the efficiency issues of answering topk keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. Compared with previous approaches, our new ranking method is simple yet effective, and agrees with human perceptions. We also study efficient query processing methods for the new ranking method, and propose algorithms that have minimal accesses to the database. We have conducted extensive experiments on largescale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency. Categories and Subject Descriptors
Stratified computation of skylines with partiallyordered domains
 PROC. OF THE ACM SIGMOD INT'L CONF. ON MANAGEMENT OF DATA
, 2005
"... In this paper, we study the evaluation of skyline queries with partiallyordered attributes. Because such attributes lack a total ordering, traditional indexbased evaluation algorithms (e.g., NN and BBS) that are designed for totallyordered attributes can no longer prune the space as effectively. ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
In this paper, we study the evaluation of skyline queries with partiallyordered attributes. Because such attributes lack a total ordering, traditional indexbased evaluation algorithms (e.g., NN and BBS) that are designed for totallyordered attributes can no longer prune the space as effectively. Our solution is to transform each partiallyordered attribute into a twointeger domain that allows us to exploit indexbased algorithms to compute skyline queries on the transformed space. Based on this framework, we propose three novel algorithms: BBS + is a straightforward adaptation of BBS using the framework, and SDC (Stratification by Dominance Classification) and SDC + are optimized to handle false positives and support progressive evaluation. Both SDC and SDC + exploit a dominance relationship to organize the data into strata. While SDC generates its strata at runtime, SDC + partitions the data into strata offline. We also design two dominance classification strategies (MinPC and MaxPC) to further optimize the performance of SDC and SDC +. We implemented the proposed schemes and evaluated their efficiency. Our results show that our proposed techniques outperform existing approaches by a wide margin, with SDC +MinPC giving the best performance in terms of both response time as well as progressiveness. To the best of our knowledge, this is the first paper to address the problem of skyline query evaluation involving partiallyordered attribute domains.
Finding kdominant skylines in high dimensional space
 SIGMOD
"... Given a ddimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
Given a ddimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are useful in many decision making applications. Unfortunately, as the number of dimensions increases, the chance of one point dominating another point is very low. As such, the number of skyline points become too numerous to offer any interesting insights. To find more important and meaningful skyline points in high dimensional space, we propose a new concept, called kdominant skyline which relaxes the idea of dominance to kdominance. A point p is said to kdominate another point q if there are k ( ≤ d) dimensions in which p is better than or equal to q and is better in at least one of these k dimensions. A point that is not kdominated by any other points is in the kdominant skyline. We prove various properties of kdominant skyline. In particular, because kdominant skyline points are not transitive, existing skyline algorithms cannot be adapted for kdominant skyline. We then present several new algorithms for finding kdominant skyline and its variants. Extensive experiments show that our methods can answer different queries on both synthetic and real data sets efficiently.
Efficient Computation of the Skyline Cube
 IN VLDB
, 2005
"... Skyline has been proposed as an important operator for multicriteria decision making, data mining and visualization, and userpreference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible nonempty subsets of a given ..."
Abstract

Cited by 50 (4 self)
 Add to MetaCart
Skyline has been proposed as an important operator for multicriteria decision making, data mining and visualization, and userpreference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible nonempty subsets of a given set of dimensions. While existing skyline computation algorithms can be immediately extended to computing each skyline query independently, such "sharednothing" algorithms are inefficient. We develop several computation sharing strategies based on e#ectively identifying the computation dependencies among multiple related skyline queries. Based on these sharing strategies, two novel algorithms, BottomUp and TopDown algorithms, are proposed to compute Skycube efficiently. Finally, our extensive performance evaluations confirm the effectiveness of the sharing strategies. It is
The spatial skyline queries
 In VLDB
, 2006
"... In this paper, for the first time, we introduce the concept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, each data point has a number of derived spatial attributes each of which is the point’s distance to a query point. An SSQ retrieves those points of ..."
Abstract

Cited by 50 (6 self)
 Add to MetaCart
In this paper, for the first time, we introduce the concept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, each data point has a number of derived spatial attributes each of which is the point’s distance to a query point. An SSQ retrieves those points of P which are not dominated by any other point in P considering their derived spatial attributes. The main difference with the regular skyline query is that this spatial domination depends on the location of the query points Q. SSQ has application in several domains such as emergency response and online maps. The main intuition and novelty behind our approaches is that we exploit the geometric properties of the SSQ problem space to avoid the exhaustive examination of all the point pairs in P and Q. Consequently, we reduce the complexity of SSQ search from O(P  2 Q) to
Multiobjective query processing for database systems
 In International Conference on Very Large Data Bases (VLDB
, 2004
"... Query processing in database systems has developed beyond mere exact matching of attribute values. Scoring database objects and retrieving only the top k matches or Paretooptimal result sets (skyline queries) are already common for a variety of applications. Specialized algorithms using either para ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
Query processing in database systems has developed beyond mere exact matching of attribute values. Scoring database objects and retrieving only the top k matches or Paretooptimal result sets (skyline queries) are already common for a variety of applications. Specialized algorithms using either paradigm can avoid naïve linear database scans and thus improve scalability. However, these paradigms are only two extreme cases of exploring viable compromises for each user‘s objectives. To find the correct result set for arbitrary cases of multiobjective query processing in databases we will present a novel algorithm for computing sets of objects that are nondominated with respect to a set of monotonic objective functions. Naturally containing top k and skyline retrieval paradigms as special cases, this algorithm maintains scalability also for all cases in between. Moreover, we will show the algorithm’s correctness and instanceoptimality in terms of necessary object accesses and how the response behavior can be improved by progressively producing result objects as quickly as possible, while the algorithm is still running. 1.
Semantics of ranking queries for probabilistic data and expected ranks
 In Proc. of ICDE’09
, 2009
"... Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditi ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
Abstract — When dealing with massive quantities of data, topk queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the topk is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a topk over deterministic data. Specifically, we define a number of fundamental properties, including exactk, containment, uniquerank, valueinvariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the wellfounded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attributelevel and tuplelevel uncertainty. For an uncertain relation of N tuples, the processing cost is O(N log N)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the topk has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach. I.
An efficient and scalable approach to cnn queries in a road network
 In Proc. of VLDB
, 2005
"... A continuous search in a road network retrieves the objects which satisfy a query condition at any point on a path. For example, return the three nearest restaurants from all locations on my route from point s to point e. In this paper, we deal with NN queries as well as continuous NN queries in t ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
A continuous search in a road network retrieves the objects which satisfy a query condition at any point on a path. For example, return the three nearest restaurants from all locations on my route from point s to point e. In this paper, we deal with NN queries as well as continuous NN queries in the context of moving objects databases. The performance of existing approaches based on the network distance such as the shortest path length depends largely on the density of objects of interest. To overcome this problem, we propose UNICONS (a unique continuous search algorithm) for NN queries and CNN queries performed on a network. We incorporate the use of precomputed NN lists into Dijkstra’s algorithm for NN queries. A mathematical rationale is employed to produce the final results of CNN queries. Experimental results for reallife datasets of various sizes show that UNICONS outperforms its competitors by up to 3.5 times for NN queries and 5 times for CNN queries depending on the density of objects and the number of NNs required. 1