Results 1 - 10
of
25
Maximal Vector Computation in Large Data Sets
- IN VLDB
, 2005
"... Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull -- and hence, linear optimization -- and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyl ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Finding the maximals in a collection of vectors is relevant to many applications. The maximal set is related to the convex hull -- and hence, linear optimization -- and nearest neighbors. The maximal vector problem has resurfaced with the advent of skyline queries for relational databases and skyline algorithms that are external and relationally well behaved. The initial
Efficient Computation of the Skyline Cube
- IN VLDB
, 2005
"... Skyline has been proposed as an important operator for multi-criteria decision making, data mining and visualization, and user-preference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible non-empty subsets of a given ..."
Abstract
-
Cited by 41 (3 self)
- Add to MetaCart
Skyline has been proposed as an important operator for multi-criteria decision making, data mining and visualization, and user-preference queries. In this paper, we consider the problem of efficiently computing a Skycube, which consists of skylines of all possible non-empty subsets of a given set of dimensions. While existing skyline computation algorithms can be immediately extended to computing each skyline query independently, such "shared-nothing" algorithms are inefficient. We develop several computation sharing strategies based on e#ectively identifying the computation dependencies among multiple related skyline queries. Based on these sharing strategies, two novel algorithms, Bottom-Up and Top-Down algorithms, are proposed to compute Skycube efficiently. Finally, our extensive performance evaluations confirm the effectiveness of the sharing strategies. It is
Refreshing the sky: the compressed skycube with efficient support for frequent updates
- In SIGMOD
, 2006
"... The skyline query is important in many applications such as multi-criteria decision making, data mining, and userpreference queries. Given a set of d-dimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
The skyline query is important in many applications such as multi-criteria decision making, data mining, and userpreference queries. Given a set of d-dimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different dimensions of the data, and issue queries on any subset of d dimensions. This paper focuses on supporting concurrent and unpredictable subspace skyline queries in frequent updated databases. Simply to compute and store the skyline objects of every subspace in a skycube will incur expensive update cost. In this paper, we investigate the important issue of updating the skycube in a dynamic environment. To balance the query cost and update cost, we propose a new structure, the compressed skycube, which concisely represents the complete skycube. We thoroughly explore the properties of the compressed skycube and provide an efficient object-aware update scheme. Experimental results show that the compressed skycube is both query and update efficient. 1.
Towards Multidimensional Subspace Skyline Analysis
"... The skyline operator is important for multicriteria decision-making applications. Although many recent studies developed efficient methods to compute skyline objects in a given space, none of them considers skylines in multiple subspaces simultaneously. More importantly, the fundamental ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
The skyline operator is important for multicriteria decision-making applications. Although many recent studies developed efficient methods to compute skyline objects in a given space, none of them considers skylines in multiple subspaces simultaneously. More importantly, the fundamental
Approaching the Efficient Frontier: Cooperative Database Retrieval Using High-Dimensional Skylines
- IN PROC. OF THE INT. CONF. ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA
, 2005
"... Cooperative database retrieval is a challenging problem: top k retrieval delivers manageable results only when a suitable compensation function (e.g. a weighted mean) is explicitly given. On the other hand skyline queries offer intuitive querying to users, but result set sizes grow exponentially and ..."
Abstract
-
Cited by 14 (13 self)
- Add to MetaCart
Cooperative database retrieval is a challenging problem: top k retrieval delivers manageable results only when a suitable compensation function (e.g. a weighted mean) is explicitly given. On the other hand skyline queries offer intuitive querying to users, but result set sizes grow exponentially and hence can easily exceed manageable levels. We show how to combine the advantages of skyline queries and top k retrieval in an interactive query processing scheme using user feedback on a manageable, representative sample of the skyline set to derive most adequate weightings for subsequent focused top k retrieval. Hence, each user’s information needs are conveniently and intuitively obtained, and only a limited set of best matching objects is returned. We will demonstrate our scheme’s efficient performance, manageable result sizes, and representativeness of the skyline. We will also show how to effectively estimate users’ compensation functions using their feedback. Our approach thus paves the way to intuitive and efficient cooperative retrieval with vague query predicates.
Algorithms and Analyses for Maximal Vector Computation
"... The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are exte ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
The maximal vector problem is to identify the maximals over a collection of vectors. This arises in many contexts and, as such, has been well studied. The problem recently gained renewed attention with skyline queries for relational databases and with work to develop skyline algorithms that are external and relationally well behaved. While many algorithms have been proposed, how they perform has been unclear. We study the performance of, and design choices behind, these algorithms. We prove runtime bounds based on the number of vectors n and the dimensionality k. Early algorithms based on divide-and-conquer established seemingly good average and worst-case asymptotic runtimes. In fact, the problem can be solved in O(n) average-case (holding k as fixed). We prove, however, that the performance is quite bad with respect to k. We demonstrate that the more recent skyline algorithms are better behaved, and can also achieve O(kn) averagecase. While k matters for these, in practice, its effect vanishes in the asymptotic. We introduce a new external algorithm, LESS, that is more efficient and better behaved. We evaluate LESS’s effectiveness and improvement over the field, and prove that its average-case running time is O(kn). 1
Efficient Top-k Aggregation of Ranked Inputs
"... A top-k query combines different rankings of the same set of objects and returns the k objects with the highest combined score according to an aggregate function. We bring to light some key observations, which impose two phases that any top-k algorithm, based on sorted accesses, should go through. B ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
A top-k query combines different rankings of the same set of objects and returns the k objects with the highest combined score according to an aggregate function. We bring to light some key observations, which impose two phases that any top-k algorithm, based on sorted accesses, should go through. Based on them, we propose a new algorithm, which is designed to minimize the number of object accesses, the computational cost, and the memory requirements of top-k search with monotone aggregate functions. We provide an analysis for its cost and show that it is always no worse than the baseline “no random accesses ” algorithm in terms of computations, accesses, and memory required. As a side contribution, we perform a space analysis, which indicates the memory requirements of top-k algorithms that only perform sorted accesses. For the case, where the required space exceeds the available memory, we propose disk-based variants of our algorithm. We propose and optimize a multiway top-k join operator, with certain advantages over evaluation trees of binary top-k join operators. Finally, we define and study the computation of top-k cubes and the implementation of roll-up and drill-down operations in such cubes. Extensive experiments with synthetic and real data show that, compared to previous techniques, our method accesses fewer objects, while being orders of magnitude faster.
Exploiting Indifference for Customization of Partial Order Skylines
- INT. DATABASE ENGINEERING AND APPLICATIONS SYMP. (IDEAS
, 2006
"... Unlike numerical preferences, preferences on attribute values do not show an inherent total order, but skyline computation has to rely on partial orderings explicitly stated by the user. In such orders many object values are incomparable, hence skylines sizes become unpractical. However, the Pareto ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Unlike numerical preferences, preferences on attribute values do not show an inherent total order, but skyline computation has to rely on partial orderings explicitly stated by the user. In such orders many object values are incomparable, hence skylines sizes become unpractical. However, the Pareto semantics can be modified to benefit from indifferences: skyline result sizes can be essentially reduced by allowing the user to declare some incomparable values as equally desirable. A major problem of adding such equivalences is that they may result in intransitivity of the aggregated Pareto order and thus efficient query processing is hampered. In this paper we analyze how far the strict Pareto semantics can be relaxed while always retaining transitivity of the induced Pareto aggregation. Extensive practical tests show that skyline sizes can indeed be reduced about two orders of magnitude when using the maximum possible relaxation still guaranteeing the consistency with all user preferences.
Eliciting Matters -- Controlling Skyline Sizes by Incremental Integration of User Preferences
- INT. CONF. ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA
, 2007
"... Today, result sets of skyline queries are unmanageable due to their exponential growth with the number of query predicates. In this paper we discuss the incremental re-computation of skylines based on additional information elicited from the user. Extending the traditional case of totally ordered do ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Today, result sets of skyline queries are unmanageable due to their exponential growth with the number of query predicates. In this paper we discuss the incremental re-computation of skylines based on additional information elicited from the user. Extending the traditional case of totally ordered domains, we consider preferences in their most general form as strict partial orders of attribute values. After getting an initial skyline set our basic approach aims at interactively increasing the system’s information about the user’s wishes explicitly including indifferences. The additional knowledge then is incorporated into the preference information and constantly reduces skyline sizes. In fact, our approach even allows users to specify trade-offs between different query predicates, thus effectively decreasing the query dimensionality. We give theoretical proof for the soundness and consistence of the extended preference information and an extensive experimental evaluation of the efficiency of our approach. On average, skyline sizes can be considerably decreased in each elicitation step.
Database Querying under Changing Preferences
, 2006
"... We present here a formal foundation for an iterative and incremental approach to constructing and evaluating preference queries. Our main focus is on query modification: a query transformation approach which works by revising the preference relation in the query. We provide a detailed analysis of th ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We present here a formal foundation for an iterative and incremental approach to constructing and evaluating preference queries. Our main focus is on query modification: a query transformation approach which works by revising the preference relation in the query. We provide a detailed analysis of the cases where the order-theoretic properties of the preference relation are preserved by the revision. We consider a number of different revision operators: union, prioritized and Pareto composition. We also formulate algebraic laws that enable incremental evaluation of preference queries. Finally, we consider two variations of the basic framework: finite restrictions of preference relations and weak-order extensions of strict partial order preference relations.

