Results 1  10
of
190
Continuous monitoring of topk queries over sliding windows
 In SIGMOD
, 2006
"... Given a dataset P and a preference function f, atopk query retrieves the k tuples in P with the highest scores according to f. Even though the problem is wellstudied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous longrunning queri ..."
Abstract

Cited by 79 (8 self)
 Add to MetaCart
(Show Context)
Given a dataset P and a preference function f, atopk query retrieves the k tuples in P with the highest scores according to f. Even though the problem is wellstudied in conventional databases, the existing methods are inapplicable to highly dynamic environments involving numerous longrunning queries. This paper studies continuous monitoring of topk queries over a fixedsize window W of the most recent data. The window size can be expressed either in terms of the number of active tuples or time units. We propose a general methodology for topk monitoring that restricts processing to the subdomains of the workspace that influence the result of some query. To cope with high stream rates and provide fast answers in an online fashion, the data in W reside in main memory. The valid records are indexed by a grid structure, which also maintains bookkeeping information. We present two processing techniques: the first one computes the new answer of a query whenever some of the current topk points expire; the second one partially precomputes the future changes in the result, achieving better running time at the expense of slightly higher space requirements. We analyze the performance of both algorithms and evaluate their efficiency through extensive experiments. Finally, we extend the proposed framework to other query types and a different data stream model. 1.
Finding kdominant skylines in high dimensional space
 SIGMOD
"... Given a ddimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are ..."
Abstract

Cited by 73 (9 self)
 Add to MetaCart
(Show Context)
Given a ddimensional data set, a point p dominates another point q if it is better than or equal to q in all dimensions and better than q in at least one dimension. A point is a skyline point if there does not exists any point that can dominate it. Skyline queries, which return skyline points, are useful in many decision making applications. Unfortunately, as the number of dimensions increases, the chance of one point dominating another point is very low. As such, the number of skyline points become too numerous to offer any interesting insights. To find more important and meaningful skyline points in high dimensional space, we propose a new concept, called kdominant skyline which relaxes the idea of dominance to kdominance. A point p is said to kdominate another point q if there are k ( ≤ d) dimensions in which p is better than or equal to q and is better in at least one of these k dimensions. A point that is not kdominated by any other points is in the kdominant skyline. We prove various properties of kdominant skyline. In particular, because kdominant skyline points are not transitive, existing skyline algorithms cannot be adapted for kdominant skyline. We then present several new algorithms for finding kdominant skyline and its variants. Extensive experiments show that our methods can answer different queries on both synthetic and real data sets efficiently.
The spatial skyline queries
 In VLDB
, 2006
"... In this paper, for the first time, we introduce the concept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, each data point has a number of derived spatial attributes each of which is the point’s distance to a query point. An SSQ retrieves those points of ..."
Abstract

Cited by 73 (7 self)
 Add to MetaCart
(Show Context)
In this paper, for the first time, we introduce the concept of Spatial Skyline Queries (SSQ). Given a set of data points P and a set of query points Q, each data point has a number of derived spatial attributes each of which is the point’s distance to a query point. An SSQ retrieves those points of P which are not dominated by any other point in P considering their derived spatial attributes. The main difference with the regular skyline query is that this spatial domination depends on the location of the query points Q. SSQ has application in several domains such as emergency response and online maps. The main intuition and novelty behind our approaches is that we exploit the geometric properties of the SSQ problem space to avoid the exhaustive examination of all the point pairs in P and Q. Consequently, we reduce the complexity of SSQ search from O(P  2 Q) to
SUBSKY: Efficient computation of skylines in subspaces
 In ICDE
, 2006
"... Given a set of multidimensional points, the skyline contains the best points according to any preference function that is monotone on all axes. In practice, applications that require skyline analysis usually provide numerous candidate attributes, and various users depending on their interests may i ..."
Abstract

Cited by 47 (7 self)
 Add to MetaCart
(Show Context)
Given a set of multidimensional points, the skyline contains the best points according to any preference function that is monotone on all axes. In practice, applications that require skyline analysis usually provide numerous candidate attributes, and various users depending on their interests may issue queries regarding different (small) subsets of the dimensions. Formally, given a relation with a large number (e.g.,> 10) of attributes, a query aims at finding the skyline in an arbitrary subspace with a low dimensionality (e.g., 2). The existing algorithms do not support subspace skyline retrieval efficiently because they (i) require scanning the entire database at least once, or (ii) are optimized for one particular subspace but incur significant overhead for other subspaces. In this paper, we propose a technique SUBSKY which settles the problem using a single Btree, and can be implemented in any relational database. The core of SUBSKY is a transformation that converts multidimensional data to 1D values, and enables several effective pruning heuristics. Extensive experiments with real data confirm that SUBSKY outperforms alternative approaches significantly in both efficiency and scalability. 1
Efficient Skyline Computation over LowCardinality Domains
, 2007
"... Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution (i.e. whether the dataset ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
(Show Context)
Current skyline evaluation techniques follow a common paradigm that eliminates data elements from skyline consideration by finding other elements in the dataset that dominate them. The performance of such techniques is heavily influenced by the underlying data distribution (i.e. whether the dataset attributes are correlated, independent, or anticorrelated). In this paper, we propose the Lattice Skyline Algorithm (LS) that is built around a new paradigm for skyline evaluation on datasets with attributes that are drawn from lowcardinality domains. LS continues to apply even if one attribute has high cardinality. Many skyline applications naturally have such data characteristics, and previous skyline methods have not exploited this property. We show that for typical dimensionalities, the complexity of LS is linear in the number of input tuples. Furthermore, we show that the performance of LS is independent of the input data distribution. Finally, we demonstrate through extensive experimentation on both real and synthetic datasets that LS can result in a significant performance advantage over existing techniques.
Relaxing join and selection queries
 In VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases
, 2006
"... Database users can be frustrated by having an empty answer to a query. In this paper, we propose a framework to systematically relax queries involving joins and selections. When considering relaxing a query condition, intuitively one seeks the ’minimal ’ amount of relaxation that yields an answer. W ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
(Show Context)
Database users can be frustrated by having an empty answer to a query. In this paper, we propose a framework to systematically relax queries involving joins and selections. When considering relaxing a query condition, intuitively one seeks the ’minimal ’ amount of relaxation that yields an answer. We first characterize the types of answers that we return to relaxed queries. We then propose a lattice based framework in order to aid query relaxation. Nodes in the lattice correspond to different ways to relax queries. We characterize the properties of relaxation at each node and present algorithms to compute the corresponding answer. We then discuss how to traverse this lattice in a way that a nonempty query answer is obtained with the minimum amount of query condition relaxation. We implemented this framework and we present our results of a thorough performance evaluation using real and synthetic data. Our results indicate the practical utility of our framework. 1.
Algorithms and Analyses for Maximal Vector Computation
"... The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are exte ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
(Show Context)
The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are external and relationally well behaved. While many algorithms have been proposed, how they perform has been unclear. We study the performance of, and design choices behind, these algorithms. We prove runtime bounds based on the number of vectors n and the dimensionality k. Early algorithms based on divideandconquer established seemingly good average and worstcase asymptotic runtimes. In fact, the problem can be solved in O(n) averagecase (holding k as fixed). We prove, however, that the performance is quite bad with respect to k. We demonstrate that the more recent skyline algorithms are better behaved, and can also achieve O(kn) averagecase. While k matters for these, in practice, its effect vanishes in the asymptotic. We introduce a new external algorithm, LESS, that is more efficient and better behaved. We evaluate LESS’s effectiveness and improvement over the field, and prove that its averagecase running time is O(kn). 1
An energyefficient mobile recommender system
 In Proc. KDD 2010
"... The increasing availability of largescale location traces creates unprecedent opportunities to change the paradigm for knowledge discovery in transportation systems. A particularly promising area is to extract energyefficient transportation patterns (green knowledge), which can be used as guidance ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
(Show Context)
The increasing availability of largescale location traces creates unprecedent opportunities to change the paradigm for knowledge discovery in transportation systems. A particularly promising area is to extract energyefficient transportation patterns (green knowledge), which can be used as guidance for reducing inefficiencies in energy consumption of transportation sectors. However, extracting green knowledge from location traces is not a trivial task. Conventional data analysis tools are usually not customized for handling the massive quantity, complex, dynamic, and distributed nature of location traces. To that end, in this paper, we provide a focused study of extracting energyefficient transportation patterns from location traces. Specifically, we have the initial focus on a sequence of mobile recommendations. As a case study, we develop a mobile recommender system which has the ability in recommending a sequence of pickup points for taxi drivers or a sequence of potential parking positions. The goal of this mobile recommendation system is to maximize the probability of business success. Along this line, we provide a Potential Travel Distance (PTD) function for evaluating each candidate sequence. This PTD function possesses a monotone property which can be used to effectively prune the search space. Based on this PTD function, we develop two algorithms, LCP and SkyRoute, for finding the recommended routes. Finally, experimental results show that the proposed system can provide effective mobile sequential recommendation and the knowledge extracted from location traces can be used for coaching drivers and leading to the efficient use of energy.
Selecting skyline services for qosbased web service composition
 in Proceedings of the 19th international conference on World wide web. ACM
"... Web service composition enables seamless and dynamic integration of business applications on the web. The performance of the composed application is determined by the performance of the involved web services. Therefore, nonfunctional, quality of service (QoS) aspects (e.g. response time, availabi ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
(Show Context)
Web service composition enables seamless and dynamic integration of business applications on the web. The performance of the composed application is determined by the performance of the involved web services. Therefore, nonfunctional, quality of service (QoS) aspects (e.g. response time, availability, etc.) are crucial for selecting the web services to take part in the composition. The problem of identifying the best candidate web services from a set of functionallyequivalent services is a multicriteria decision making problem. The selected services should optimize the overall QoS of the composed application, while satisfying all the constraints specified by the client on individual QoS parameters. In this paper, we propose an approach based on the notion of skyline to effectively and efficiently select services for composition, reducing the number of candidate services to be considered. We also discuss how a provider can improve its service to become more competitive and increase its potential of being included in composite applications. We evaluate our approach experimentally using both real and synthetically generated datasets.
Skypeer: Efficient subspace skyline computation over distributed data
 In ICDE
, 2007
"... Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multidimensional dataset. While most previous work has assumed a centralized setting, in this paper we address the efficient computati ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
(Show Context)
Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multidimensional dataset. While most previous work has assumed a centralized setting, in this paper we address the efficient computation of subspace skyline queries in largescale peertopeer (P2P) networks, where the dataset is horizontally distributed across the peers. Relying on a superpeer architecture we propose a threshold based algorithm, called SKYPEER, which forwards the skyline query requests among peers, in such a way that the amount of transferred data is significantly reduced. For efficient subspace skyline processing, we extend the notion of domination by defining the extended skyline set, which contains all data elements that are necessary to answer a skyline query in any arbitrary subspace. We prove that our algorithm provides the exact answers and we present optimization techniques to reduce communication cost and execution time. Finally, we provide an extensive experimental evaluation showing that SKYPEER performs efficiently and provides a viable solution when a large degree of distribution is required. 1.