Results 1 - 10
of
26
Probabilistic skylines on uncertain data
- In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07), Viena
, 2007
"... Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this pap ..."
Abstract
-
Cited by 39 (10 self)
- Add to MetaCart
Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all the objects whose skyline probabilities are at least p. Computing probabilistic skylines on large uncertain data sets is challenging. We develop two efficient algorithms. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our two algorithms are efficient on large data sets, and complementary to each other in performance. 1.
Efficient Computation of Reverse Skyline Queries
, 2007
"... In this paper, for the first time, we introduce the concept of Reverse Skyline Queries. At first, we consider for a multidimensional data set P the problem of dynamic skyline queries according to a query point q. This kind of dynamic skyline corresponds to the skyline of a transformed data space whe ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
In this paper, for the first time, we introduce the concept of Reverse Skyline Queries. At first, we consider for a multidimensional data set P the problem of dynamic skyline queries according to a query point q. This kind of dynamic skyline corresponds to the skyline of a transformed data space where point q becomes the origin and all points of P are represented by their distance vector to q. The reverse skyline query returns the objects whose dynamic skyline contains the query object q. In order to compute the reverse skyline of an arbitrary query point, we first propose a Branch and Bound algorithm (called BBRS), which is an improved customization of the original BBS algorithm. Furthermore, we identify a super set of the reverse skyline that is used to bound the search space while computing the reverse skyline. To further reduce the computational cost of determining if a point belongs to the reverse skyline, we propose an enhanced algorithm (called RSSA) that is based on accurate pre-computed approximations of the skylines. These approximations are used to identify whether a point belongs to the reverse skyline or not. Through extensive experiments with both real-world and synthetic datasets, we show that our algorithms can efficiently support reverse skyline queries. Our enhanced approach improves reversed skyline processing by up to an order of magnitude compared to the algorithm without the usage of pre-computed approximations.
Distance-based Representative Skyline
"... Abstract — Given an integer k, arepresentative skyline contains the k skyline points that best describe the tradeoffs among different dimensions offered by the full skyline. Although this topic has been previously studied, the existing solution may sometimes produce k points that appear in an arbitr ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Abstract — Given an integer k, arepresentative skyline contains the k skyline points that best describe the tradeoffs among different dimensions offered by the full skyline. Although this topic has been previously studied, the existing solution may sometimes produce k points that appear in an arbitrarily tiny cluster, and therefore, fail to be representative. Motivated by this, we propose a new definition of representative skyline that minimizes the distance between a non-representative skyline point and its nearest representative. We also study algorithms for computing distance-based representative skylines. In 2D space, there is a dynamic programming algorithm that guarantees the optimal solution. For dimensionality at least 3, we prove that the problem is NP-hard, and give a 2-approximate polynomial time algorithm. Using a multidimensional access method, our algorithm can directly report the representative skyline, without retrieving the full skyline. We show that our representative skyline not only better captures the contour of the entire skyline than the previous method, but also can be computed much faster. I.
Efficient Evaluation of Probabilistic Advanced Spatial Queries on Existentially Uncertain Data
"... Abstract—We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probab ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract—We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions. Index Terms—H.2.4.h Query processing, H.2.4.k Spatial databases 1
Skyline query processing for incomplete data
- In Proc. 24th Int. Conf. on Data Engineering
, 2008
"... Abstract — Recently, there has been much interest in processing skyline queries for various applications that include decision making, personalized services, and search pruning. Skyline queries aim to prune a search space of large numbers of multidimensional data items to a small set of interesting ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract — Recently, there has been much interest in processing skyline queries for various applications that include decision making, personalized services, and search pruning. Skyline queries aim to prune a search space of large numbers of multidimensional data items to a small set of interesting items by eliminating items that are dominated by others. Existing skyline algorithms assume that all dimensions are available for all data items. This paper goes beyond this restrictive assumption as we address the more practical case of involving incomplete data items (i.e., data items missing values in some of their dimensions). In contrast to the case of complete data where the dominance relation is transitive, incomplete data suffer from non-transitive dominance relation which may lead to a cyclic dominance behavior. We first propose two algorithms, namely, “Replacement ” and “Bucket ” that use traditional skyline algorithms for incomplete data. Then, we propose the “ISkyline” algorithm that is designed specifically for the case of incomplete data. The “ISkyline ” algorithm employs two optimization techniques, namely, virtual points and shadow skylines to tolerate cyclic dominance relations. Experimental evidence shows that the “ISkyline ” algorithm significantly outperforms variations of traditional skyline algorithms. I.
Dynamic Skyline Queries in Metric Spaces
"... Skyline query is of great importance in many applications, such as multi-criteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points u ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Skyline query is of great importance in many applications, such as multi-criteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points usually assume static data objects in the database (i.e. their attribute vectors are fixed), whereas several recent work focus on skyline queries with dynamic attributes. In this paper, we propose a novel variant of skyline queries, namely metric skyline, whose dynamic attributes are defined in the metric space (i.e. not limited to the Euclidean space). We illustrate an efficient and effective pruning mechanism to answer metric skyline queries through a metric index. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed pruning techniques over the metric index in answering metric skyline queries. 1.
Minimizing the communication cost for continuous skyline maintenance
- In SIGMOD Conference
, 2009
"... Existing work in the skyline literature focuses on optimizing the processing cost. This paper aims at minimization of the communication overhead in client-server architectures, where a server continuously maintains the skyline of dynamic objects. Our first contribution is a Filter method that avoids ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Existing work in the skyline literature focuses on optimizing the processing cost. This paper aims at minimization of the communication overhead in client-server architectures, where a server continuously maintains the skyline of dynamic objects. Our first contribution is a Filter method that avoids transmission of updates from objects that cannot influence the skyline. Specifically, each object is assigned a filter so that it needs to issue an update only if it violates its filter. Filter achieves significant savings over the naive approach of transmitting all updates. Going one step further, we introduce the concept of frequent skyline query over a sliding window (FSQW). The motivation is that snapshot skylines are not very useful in streaming environments because they keep changing over time. Instead, FSQW reports the objects that appear in the skylines of at least θ·s of the s most recent timestamps (0 < θ ≤ 1). Filter can be easily adapted to FSQW processing, however, with potentially high overhead for large and frequently updated datasets. To further reduce the communication cost, we propose a Sampling method, which returns approximate FSQW results without computing each snapshot skyline. Finally, we integrate Filter and Sampling in a Hybrid approach that combines their individual advantages.
Topologically sorted skylines for partially ordered domains
- In ICDE
, 2009
"... Abstract — The vast majority of work on skyline queries considers totally ordered domains, whereas in many applications some attributes are partially ordered, as for instance, domains of set values, hierarchies, intervals and preferences. The only work addressing this issue has limited progressivene ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract — The vast majority of work on skyline queries considers totally ordered domains, whereas in many applications some attributes are partially ordered, as for instance, domains of set values, hierarchies, intervals and preferences. The only work addressing this issue has limited progressiveness and pruning ability, and it is only applicable to static skylines. This paper overcomes these problems with the following contributions. (i) We introduce a generic framework, termed TSS, for handling partially ordered domains using topological sorting. (ii) We propose a novel dominance check that eliminates false hits/misses, further enhancing progressiveness and pruning ability. (iii) We extend our methodology to dynamic skylines with respect to an input query. In this case, the dominance relationships change according to the query specification, and their computation is rather complex. We perform an extensive experimental evaluation demonstrating that TSS is up to 9 times and up to 2 orders of magnitude faster than existing methods in the static and the dynamic case, respectively. I.
Kernel-Based Skyline Cardinality Estimation
"... The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The skyline of a d-dimensional dataset consists of all points not dominated by others. The incorporation of the skyline operator into practical database systems necessitates an efficient and effective cardinality estimation module. However, existing theoretical work on this problem is limited to the case where all d dimensions are independent of each other, which rarely holds for real datasets. The state of the art Log Sampling (LS) technique simply applies theoretical results for independent dimensions to non-independent data anyway, sometimes leading to large estimation errors. To solve this problem, we propose a novel Kernel-Based (KB) approach that approximates the skyline cardinality with nonparametric methods. Extensive experiments with various real datasets demonstrate that KB achieves high accuracy, even in cases where LS fails. At the same time, despite its numerical nature, the efficiency of KB is comparable to that of LS. Furthermore, we extend both LS and KB to the k-dominant skyline, which is commonly used instead of the conventional skyline for high-dimensional data.
On domination game analysis for microeconomic data mining
- TKDD
"... Game theory is a powerful tool for the analysis of the competitions among manufacturers in a market. In this paper, we present a study on combining game theory and data mining by introducing the concept of domination game analysis. We present a multidimensional market model, where every dimension re ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Game theory is a powerful tool for the analysis of the competitions among manufacturers in a market. In this paper, we present a study on combining game theory and data mining by introducing the concept of domination game analysis. We present a multidimensional market model, where every dimension represents one attribute of a commodity. Every product or customer is represented by a point in the multidimensional space, and a product is said to “dominate ” a customer if all of its attributes can satisfy the requirements of the customer. The expected market share of a product is measured by the expected number of the buyers in the customers, all of which are equally likely to buy any product dominating him. A Nash Equilibrium is a configuration of the products achieving stable expected market shares for all products. We prove that Nash Equilibrium in such a model can be computed in polynomial time if every manufacturer tries to modify its product in a round robin manner. To further improve the efficiency of the computation, we also design two algorithms for the manufacturers to efficiently find their best response to other products in the market.

