Results 1 - 10
of
20
Probabilistic skylines on uncertain data
- In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract
-
Cited by 39 (10 self)
- Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Efficient skyline query processing on peer-to-peer networks
- In IEEE International Conference on Data Engineering (ICDE) (2007
, 2007
"... Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peer-to-peer (P2P) network is still an emerging topic. The desiderata of efficien ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peer-to-peer (P2P) network is still an emerging topic. The desiderata of efficient skyline querying in P2P environment include: 1) progressive returning of answers, 2) low processing cost in terms of number of peers accessed and search messages, 3) balanced query loads among the peers. In this paper, we propose a solution that satisfies the three desiderata. Our solution is based on a balanced tree structured P2P network. By partitioning the skyline search space adaptively based on query accessing patterns, we are able to alleviate the problem of “hot ” spots present in the skyline query processing. By being able to estimate the peer nodes within the query subspaces, we are able to control the amount of query forwarding, limiting the number of peers involved and the amount of messages transmitted in the network. Load balancing is achieved in query load conscious data space splitting/merging during the joining/departure of nodes and through dynamic load migration. Experiments on real and synthetic datasets confirm the effectiveness and scalability of our algorithm on P2P networks. 1.
Deltasky: Optimal maintenance of skyline deletions without exclusive dominance region generation
- In UCSB Tech Report, 2006. http://www.cs.ucsb.edu/ ∼ pingwu/ deltasky.pdf
, 2007
"... This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
This paper addresses the problem of efficient maintenance of a materialized skyline view in response to skyline removals. While there has been significant progress on skyline query computation, an equally important but largely unanswered issue is on the incremental maintenance for skyline deletions. Previous work suggested the use of the so called exclusive dominance region (EDR) to achieve optimal I/O performance for deletion maintenance. However, the shape of an EDR becomes extremely complex in higher dimensions, and algorithms for its computation have not been developed. We derive a systematic way to decompose a d-dimensional EDR into a collection of hyper-rectangles. We show that the number of such hyper-rectangles is O(m d), where m is the current skyline result size. We then propose a novel algorithm DeltaSky which determines whether an intermediate R-tree MBR intersects with the EDR without explicitly calculating the EDR itself. This reduces the worse case complexity of the EDR intersection check from O(m d) to O(md). Thus DeltaSky helps the branch and bound skyline algorithm achieve I/O optimality for deletion maintenance by finding only the newly appeared skyline points after the deletion. We discuss implementation issues and show that DeltaSky can be efficiently implemented using one extra B-Tree. Moreover, we propose two optimization techniques which further reduce the average cost in practice. Extensive experiments demonstrate that DeltaSky achieves orders of magnitude performance gain over alternative solutions. 1
SKYPEER: Efficient subspace skyline computation over distributed data
- In Proceedings of ICDE’07
, 2007
"... Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multidimensional dataset. While most previous work has assumed a centralized setting, in this paper we address the efficient computation ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multidimensional dataset. While most previous work has assumed a centralized setting, in this paper we address the efficient computation of subspace skyline queries in largescale peer-to-peer (P2P) networks, where the dataset is horizontally distributed across the peers. Relying on a superpeer architecture we propose a threshold based algorithm, called SKYPEER, which forwards the skyline query requests among peers, in such a way that the amount of transferred data is significantly reduced. For efficient subspace skyline processing, we extend the notion of domination by defining the extended skyline set, which contains all data elements that are necessary to answer a skyline query in any arbitrary subspace. We prove that our algorithm provides the exact answers and we present optimization techniques to reduce communication cost and execution time. Finally, we provide an extensive experimental evaluation showing that SKYPEER performs efficiently and provides a viable solution when a large degree of distribution is required. 1.
Scalable Skyline Computation Using Object-based Space Partitioning
"... The skyline operator returns from a set of multi-dimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multi-objective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority o ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The skyline operator returns from a set of multi-dimensional objects a subset of superior objects that are not dominated by others. This operation is considered very important in multi-objective analysis of large datasets. Although a large number of skyline methods have been proposed, the majority of them focuses on minimizing the I/O cost. However, in high dimensional spaces, the problem can easily become CPU-bound due to the large number of computations required for comparing objects with current skyline points while scanning the database. Based on this observation, we propose a dynamic indexing technique for skyline points that can be integrated into state-of-the-art sort-based skyline algorithms to boost their computational performance. The new indexing and dominance checking approach is supported by a theoretical analysis, while our experiments show that it scales well with the input size and dimensionality not only because unnecessary dominance checks are avoided but also because it allows efficient dominance checking with the help of bitwise operations.
Parallel Skyline Computation on Multicore Architectures
"... With the advent of multicore processors, it has become imperative to write parallel programs if one wishes to exploit the next generation of processors. This paper deals with skyline computation as a case study of parallelizing database operations on multicore architectures. We compare two parallel ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
With the advent of multicore processors, it has become imperative to write parallel programs if one wishes to exploit the next generation of processors. This paper deals with skyline computation as a case study of parallelizing database operations on multicore architectures. We compare two parallel skyline algorithms: a parallel version of the branch-and-bound algorithm (BBS) and a new parallel algorithm based on skeletal parallel programming. Experimental results show despite its simple design, the new parallel algorithm is comparable to parallel BBS in speed. For sequential skyline computation, the new algorithm far outperforms sequential BBS when the density of skyline tuples is low.
Processing relaxed skylines in PDMS using distributed data summaries
- IN PROCEEDINGS OF CIKM’06
, 2006
"... Peer Data Management Systems (PDMS) are a natural extension of heterogeneous database systems. One of the main tasks in such systems is efficient query processing. Insisting on complete answers, however, leads to asking almost every peer in the network. Relaxing these completeness requirements by ap ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Peer Data Management Systems (PDMS) are a natural extension of heterogeneous database systems. One of the main tasks in such systems is efficient query processing. Insisting on complete answers, however, leads to asking almost every peer in the network. Relaxing these completeness requirements by applying approximate query answering techniques can significantly reduce costs. Since most users are not interested in the exact answers to their queries, rank-aware query operators like top-k or skyline play an important role in query processing. In this paper, we present the novel concept of relaxed skylines that combines the advantages of both rank-aware query operators and approximate query processing techniques. Furthermore, we propose a strategy for processing relaxed skylines in distributed environments that allows for giving guarantees for the completeness of the result using distributed data summaries as routing indexes.
Minimizing the communication cost for continuous skyline maintenance
- In SIGMOD Conference
, 2009
"... Existing work in the skyline literature focuses on optimizing the processing cost. This paper aims at minimization of the communication overhead in client-server architectures, where a server continuously maintains the skyline of dynamic objects. Our first contribution is a Filter method that avoids ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Existing work in the skyline literature focuses on optimizing the processing cost. This paper aims at minimization of the communication overhead in client-server architectures, where a server continuously maintains the skyline of dynamic objects. Our first contribution is a Filter method that avoids transmission of updates from objects that cannot influence the skyline. Specifically, each object is assigned a filter so that it needs to issue an update only if it violates its filter. Filter achieves significant savings over the naive approach of transmitting all updates. Going one step further, we introduce the concept of frequent skyline query over a sliding window (FSQW). The motivation is that snapshot skylines are not very useful in streaming environments because they keep changing over time. Instead, FSQW reports the objects that appear in the skylines of at least θ·s of the s most recent timestamps (0 < θ ≤ 1). Filter can be easily adapted to FSQW processing, however, with potentially high overhead for large and frequently updated datasets. To further reduce the communication cost, we propose a Sampling method, which returns approximate FSQW results without computing each snapshot skyline. Finally, we integrate Filter and Sampling in a Hybrid approach that combines their individual advantages.
On domination game analysis for microeconomic data mining
- TKDD
"... Game theory is a powerful tool for the analysis of the competitions among manufacturers in a market. In this paper, we present a study on combining game theory and data mining by introducing the concept of domination game analysis. We present a multidimensional market model, where every dimension re ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Game theory is a powerful tool for the analysis of the competitions among manufacturers in a market. In this paper, we present a study on combining game theory and data mining by introducing the concept of domination game analysis. We present a multidimensional market model, where every dimension represents one attribute of a commodity. Every product or customer is represented by a point in the multidimensional space, and a product is said to “dominate ” a customer if all of its attributes can satisfy the requirements of the customer. The expected market share of a product is measured by the expected number of the buyers in the customers, all of which are equally likely to buy any product dominating him. A Nash Equilibrium is a configuration of the products achieving stable expected market shares for all products. We prove that Nash Equilibrium in such a model can be computed in polynomial time if every manufacturer tries to modify its product in a round robin manner. To further improve the efficiency of the computation, we also design two algorithms for the manufacturers to efficiently find their best response to other products in the market.
Distributed Skyline Retrieval with Low Bandwidth Consumption
"... We consider skyline computation when the underlying dataset is horizontally partitioned onto geographically distant servers that are connected to the Internet. The existing solutions are not suitable for our problem, because they have at least one of the following drawbacks: (i) applicable only to d ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We consider skyline computation when the underlying dataset is horizontally partitioned onto geographically distant servers that are connected to the Internet. The existing solutions are not suitable for our problem, because they have at least one of the following drawbacks: (i) applicable only to distributed systems adopting vertical partitioning or restricted horizontal partitioning, (ii) effective only when each server has limited computing and communication abilities, and (iii) optimized only for skyline search in subspaces but inefficient in the full space. This paper proposes an algorithm, called feedback-based distributed skyline (FDS), to support arbitrary horizontal partitioning. FDS aims at minimizing the network bandwidth, measured in the number of tuples transmitted over the network. The core of FDS is a novel feedback-driven mechanism, where the coordinator iteratively transmits certain feedback to each participant. Participants can leverage such information to prune a large amount of local data, which otherwise would need to be sent to the coordinator. Extensive experimentation confirms that FDS significantly outperforms alternative approaches in both effectiveness and progressiveness.

