## Answering top-k queries with multi-dimensional selections: The ranking cube approach (2006)

### Cached

### Download Links

- [www.cs.uiuc.edu]
- [www.cs.uiuc.edu]
- [www.xiaolei.org]
- [www.vldb.org]
- [www.searchforum.org.cn]
- [ews.uiuc.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In VLDB |

Citations: | 17 - 7 self |

### BibTeX

@INPROCEEDINGS{Xin06answeringtop-k,

author = {Dong Xin and Jiawei Han and Hong Cheng and Xiaolei Li},

title = {Answering top-k queries with multi-dimensional selections: The ranking cube approach},

booktitle = {In VLDB},

year = {2006},

pages = {463--475}

}

### OpenURL

### Abstract

Observed in many real applications, a top-k query often consists of two components to reflect a user’s preference: a selection condition and a ranking function. A user may not only propose ad hoc ranking functions, but also use different interesting subsets of the data. In many cases, a user may want to have a thorough study of the data by initiating a multi-dimensional analysis of the top-k query results. Previous work on top-k query processing mainly focuses on optimizing data access according to the ranking function only. The problem of efficient answering top-k queries with multidimensional selections has not been well addressed yet. This paper proposes a new computational model, called ranking cube, for efficient answering top-k queries with multidimensional selections. We define a rank-aware measure for the cube, capturing our goal of responding to multidimensional ranking analysis. Based on the ranking cube, an efficient query algorithm is developed which progressively retrieves data blocks until the top-k results are found. The curse of dimensionality is a well-known challenge for the data cube and we cope with this difficulty by introducing a new technique of ranking fragments. Our experiments on Microsoft’s SQL Server 2005 show that our proposed approaches have significant improvement over the previous methods. 1.

### Citations

577 | Optimal Aggregation Algorithms for Middleware
- Fagin, Lotem, et al.
- 2001
(Show Context)
Citation Context ...unctions are convex. Note we made no assumption on the linear weights and they can be chosen either positive or negative. Hence the convex functions are more general to the commonly discussed monotone=-=[14]-=- functions where the weights are restricted to be non-negative. Many distance measures are also convex functions. Suppose a top-k query looks for k tuples t = (t1, . . . , tr) which are the closest to... |

191 |
Equi-depth histograms for estimating selectivity factors for multi- dimensional queries
- Dewitt
- 1988
(Show Context)
Citation Context ... ( T P ) 1 R , where R is the number of ranking dimensions and T is the number of tuples in the database. There are other partition strategies, e.g., equi-width partition, multi-dimensional partition =-=[20]-=-, etc.. For simplicity, we demonstrate our method using equi-depth partitioning. Our framework accepts other partitioning strategies and we will discuss this in section 6. Without loss of generality, ... |

149 |
Accurate Estimation of the Number of Tuples Satisfying a Condition
- Piatetsky-Shapiro
- 1984
(Show Context)
Citation Context ...ltiple blocks such that (1) the expected number of tuples in each block is P , and (2) the tuples in the same block are geometrically close to each other. One possible way is the equi-depth partition =-=[22]-=- of each ranking dimension. The number of bins b for each dimension can be calculated by b = ( T P ) 1 R , where R is the number of ranking dimensions and T is the number of tuples in the database. Th... |

142 | Modern Information Retrieval: A Brief Overview,” Bulletin of the
- Singhal
- 2001
(Show Context)
Citation Context ... in ranking fragments can be performed much faster using the bit-AND operation than the standard merge-intersect operation. Another compression method of the tid-lists come from information retrieval =-=[24]-=-. The main observation is that the numbers in the tid-list are stored in ascending order. Thus, it would be possible to store a list of tid difference instead of the actual numbers. The insight is tha... |

133 | Prefer: A system for the efficient execution of multi-parametric ranked queries
- Hristidis, Koudas, et al.
- 2001
(Show Context)
Citation Context ...ata organization includes Onion [8], which builds convex hulls on data records according to their geometry relations and answers top-k queries by progressively retrieving data with levels; and PREFER =-=[16]-=-, which creates ranked views and answers top-k queries by mapping query parameters to view parameters. Both approaches assume the ranking functions are linear, and hence have limitations to answer oth... |

125 | Fuzzy Queries in Multimedia Database Systems
- Fagin
- 1998
(Show Context)
Citation Context ....s6. DISCUSSION In this section, we discuss the related work and possible extensions if our proposed approach. 6.1 Related Work Top-k query processing has been studied in both the middleware scenario =-=[13, 14, 7]-=- and in the relational database setting [5, 4, 9, 17, 18, 16]. These studies mainly discuss the configurations where only ranking dimensions are involved, the problem of top-k queries with multi-dimen... |

116 | Multi-Dimensional Regression Analysis of Time-Series Data Streams
- Chen, Dong, et al.
(Show Context)
Citation Context ...The pre-computed measures in the cube are generally simple statistics (e.g., SUM, COUNT, AVERAGE). Some recent proposals introduces more complex measures for data cube such as linear regression model =-=[11]-=- and classification model [10]. To the best of our knowledge, this is the first piece of work that provides multidimensional ranking analysis using data cube. The tid list stored in the ranking cube i... |

103 | Top-k Selection Queries Over Relational Databases: Mapping Strategies and Performance Evaluation
- Bruno, Chaudhuri, et al.
- 2002
(Show Context)
Citation Context ...swer other common ranking functions. Moreover, their data organizations are not aware of the multi-dimensional selection conditions. A closely related study is the top-k selection queries proposed in =-=[4]-=-, where the authors proposed to map a top-k selection query to a range query. The soft selection conditions in their queries are essentially the ranking functions for the k nearest neighbor search and... |

97 | Bitmap index design and evaluation
- Chan, Ioannidis
(Show Context)
Citation Context ... can be compressed, such that each block contains more tids. As the result, the system will retrieve less number of blocks for evaluating a ranked query. One compression method is the bitmap indexing =-=[2, 6]-=-. In many applications, the cardinalities of selection dimensions are small. For example, in the used car database, the majority of selection dimensions only have 2 possible values, e.g., whether it h... |

86 | RankSQL: query algebra and optimization for relational top-k queries
- Li, Chang, et al.
- 2005
(Show Context)
Citation Context ...ated work and possible extensions if our proposed approach. 6.1 Related Work Top-k query processing has been studied in both the middleware scenario [13, 14, 7] and in the relational database setting =-=[5, 4, 9, 17, 18, 16]-=-. These studies mainly discuss the configurations where only ranking dimensions are involved, the problem of top-k queries with multi-dimensional selections is not well addressed. The closest related ... |

74 | On Saying "Enough Already - Carey, Kossmann - 1997 |

69 |
The Onion Technique: Indexing for Linear Optimization Queries
- Chang, Bergman, et al.
- 2000
(Show Context)
Citation Context ... database executer still needs to issue multiple random accesses on the data. This is quite expensive, especially when the database is large. Recent work on rankaware data organization includes Onion =-=[8]-=-, which builds convex hulls on data records according to their geometry relations and answers top-k queries by progressively retrieving data with levels; and PREFER [16], which creates ranked views an... |

57 | Efficient Searching with Linear Constraints
- Agarwal, Arge, et al.
- 1998
(Show Context)
Citation Context ...g equi-depth partitioning. Our framework accepts other partitioning strategies and we will discuss this in section 6. Without loss of generality, we assume that the range of each ranking dimension is =-=[0, 1]-=-. We refer the partitioned blocks as base blocks, and the new block dimension B contains the base block IDs (simplified as bid) for each tuple. The original database can be decomposed into two sub-dat... |

45 |
Principles of Mathematical Analysis (3rd Ed
- Rudin
- 1976
(Show Context)
Citation Context ...ex ranking functions. Its extension to other functions will be discussed later in this paper. The formal definition of the convex function is presented in Definition 1. Definition 1. (Convex Function =-=[23]-=-) A continuous function f is convex if for any two points x1 and x2 in its domain [a, b], and any λ where 0 < λ < 1: f(λx1 + (1 − λ)x2) ≤ λf(x1) + (1 − λ)f(x2)sThe convex functions already cover a bro... |

43 | Rank-aware Query Optimization
- Ilyas, Shah, et al.
- 2004
(Show Context)
Citation Context ...ated work and possible extensions if our proposed approach. 6.1 Related Work Top-k query processing has been studied in both the middleware scenario [13, 14, 7] and in the relational database setting =-=[5, 4, 9, 17, 18, 16]-=-. These studies mainly discuss the configurations where only ranking dimensions are involved, the problem of top-k queries with multi-dimensional selections is not well addressed. The closest related ... |

41 | Optimizing queries on compressed bitmaps
- Amer-Yahia, Johnson
(Show Context)
Citation Context ... can be compressed, such that each block contains more tids. As the result, the system will retrieve less number of blocks for evaluating a ranked query. One compression method is the bitmap indexing =-=[2, 6]-=-. In many applications, the cardinalities of selection dimensions are small. For example, in the used car database, the majority of selection dimensions only have 2 possible values, e.g., whether it h... |

35 | Integrating db and ir technologies: What is the sound of one hand clapping
- Chaudhuri, Ramakrishnan, et al.
- 2005
(Show Context)
Citation Context ...ated work and possible extensions if our proposed approach. 6.1 Related Work Top-k query processing has been studied in both the middleware scenario [13, 14, 7] and in the relational database setting =-=[5, 4, 9, 17, 18, 16]-=-. These studies mainly discuss the configurations where only ranking dimensions are involved, the problem of top-k queries with multi-dimensional selections is not well addressed. The closest related ... |

32 | Processing Queries by Linear Constraints
- Goldstein, Ramakrishnan, et al.
- 1997
(Show Context)
Citation Context ...ections imposed on queries. We organize tuples into different blocks based on their geometry layout information. Previous work on exploiting data distributions for efficient query processing includes =-=[8, 1, 15]-=-. The ranking function covered in those studies are linear. We have demonstrated our method with convex functions and the extension to ad hoc ranking functions is fairly straightforward (see Section 6... |

25 | High-Dimensional OLAP: A Minimal Cubing Approach
- Li, Han, et al.
- 2004
(Show Context)
Citation Context ...onal space. Their data structures and algorithms were only designed to index data points, with measure aggregation. The model of multi-dimensional inverted index and measure aggregation is studied by =-=[19]-=-. The major difference of our approach and all these studies is our bid list is rank-aware and it supports progressive retrieving for efficient processing of top-k queries. Moreover, the contents in t... |

14 |
Chen-Chuan Chang and Seung won Hwang. Minimal probing: supporting expensive predicates for top-k queries
- Kevin
- 2002
(Show Context)
Citation Context ....s6. DISCUSSION In this section, we discuss the related work and possible extensions if our proposed approach. 6.1 Related Work Top-k query processing has been studied in both the middleware scenario =-=[13, 14, 7]-=- and in the relational database setting [5, 4, 9, 17, 18, 16]. These studies mainly discuss the configurations where only ranking dimensions are involved, the problem of top-k queries with multi-dimen... |

8 | Optimal multidimensional query processing using tree striping
- Berchtold, BÄohm, et al.
(Show Context)
Citation Context ...king analysis using data cube. The tid list stored in the ranking cube is similar to the ideas of inverted index as termed in the information retrieval and value-list index as termed in databases. In =-=[3]-=-, the authors investigated the usage of low dimensional data structures for indexing a high dimensional space. Their data structures and algorithms were only designed to index data points, with measur... |

4 |
An overview of data warehousing and data cube
- Churdhuri, Dayal
- 1997
(Show Context)
Citation Context ...entially the ranking functions for the k nearest neighbor search and our problem of answering top-k queries with hard selection conditions is not considered. For multi-dimensional analysis, data cube =-=[12]-=- has been extensively studied. Materialization of a data cube is a way to pre-compute and store multi-dimensional aggregates so that online analytical processing can be performed efficiently. Traditio... |

2 | O’Neil and Dallan Quass. Improved query performance with variant indexes - Patrick - 1997 |

1 |
Raghu Ramakrishnan. Prediction cubes
- Chen, Chen, et al.
- 2005
(Show Context)
Citation Context ...he cube are generally simple statistics (e.g., SUM, COUNT, AVERAGE). Some recent proposals introduces more complex measures for data cube such as linear regression model [11] and classification model =-=[10]-=-. To the best of our knowledge, this is the first piece of work that provides multidimensional ranking analysis using data cube. The tid list stored in the ranking cube is similar to the ideas of inve... |