Results 1 
7 of
7
Optimizing Queries over Multimedia Repositories
, 1996
"... Multimedia repositories and applications that retrieve multimedia information are becoming increasingly popular. In this paper, we study the problem of selecting objects from multimedia repositories, and show how this problem relates to the processing and optimization of selection queries in other c ..."
Abstract

Cited by 78 (8 self)
 Add to MetaCart
Multimedia repositories and applications that retrieve multimedia information are becoming increasingly popular. In this paper, we study the problem of selecting objects from multimedia repositories, and show how this problem relates to the processing and optimization of selection queries in other contexts, e.g., when some of the selection conditions are expensive userdefined predicates. We find that the problem has unique characteristics that lead to interesting new research questions and results. This article presents an overview of the results in [1]. An expanded version of that paper is in preparation [2]. 1 Query Model In this section we first describe the model that we use for querying multimedia repositories. Then, we briefly review related models for querying text and image repositories. 1.1 Our Query Model In our model, a multimedia repository consists of a set of multimedia objects, each with a distinct object identity. Each multimedia object has a set of attributes, like...
Optimizing topk selection queries over multimedia repositories
, 2003
"... Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated wi ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Further more, unlike in the relational model, users may just want the k topranked objects for their selection queries, for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. In this paper, we investigate how to optimize the processing of topk selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repos itory strongly influences the cost of processing the filtering condition. We define an execution space that is searchminimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the searchminimal execution space is NPhard, we present an efficient algorithm that solves the problem optimally with respect to our cost model and execution space when the predicates in the query are independent. We also show that the problem of optimizing topk selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus,
A Lower Bound Theorem for Indexing Schemes and its Application to Multidimensional Range Queries
 In Proceedings of the ACM Symposium on Principles of Database Systems
, 1998
"... Indexing schemes were proposed by Hellerstein, Koutsoupias and Papadimitriou [7] to model data indexing on external memory. Using indexing schemes, the complexity of indexing is quantified by two parameters: storage redundancy and access overhead. There is a tradeoff between these two parameters, in ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Indexing schemes were proposed by Hellerstein, Koutsoupias and Papadimitriou [7] to model data indexing on external memory. Using indexing schemes, the complexity of indexing is quantified by two parameters: storage redundancy and access overhead. There is a tradeoff between these two parameters, in the sense that for some problems it is not possible for both of these to be low. In this paper we derive a lowerbounds theorem for arbitrary indexing schemes. We apply our theorem to the particular problem of ddimensional range queries. We first resolve the open problem of [7] for a tight lower bound for 2dimensional range queries and extend our lower bound to ddimensional range queries. We then show, how, the construction in our lowerbounds proof may be exploited to derive indexing schemes for ddimensional range queries, whose asymptotic complexity matches our lower bounds. 1
Nearest Neighbor Search in Multidimensional Spaces
, 1999
"... The Nearest Neighbor Search problem is defined as follows: given a set P of n points, preprocess the points so as to efficiently answer queries that require finding the closest point in P to a query point q. If we are willing to settle for a point that is almost as close as the nearest neighbor, t ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The Nearest Neighbor Search problem is defined as follows: given a set P of n points, preprocess the points so as to efficiently answer queries that require finding the closest point in P to a query point q. If we are willing to settle for a point that is almost as close as the nearest neighbor, then we can relax the problem to the approximate Nearest Neighbor Search. Nearest Neighbor Search (exact or approximate) is an integral component in a wide range of applications that include multimedia databases, computational biology, data mining, and information retrieval. The common thread in all these applications is similarity search: given a database of objects, we want to return the object in the database that is most similar to a query object. The objects are mapped onto points in a high dimensional metric space , and similarity search reduces to a nearest neighbor search. The dimension of the underlying space may be in the order of a few hundreds, or thousands; therefore, we r...
Fractal Dimension and Vector Quantization [Extended Abstract]
, 2003
"... Krishna Kumaraswamy Center for Automated Learning and Discovery, Carnegie Mellon University skkumar@cs.cmu.edu Vasileios Megalooikonomou Department of Computer and Information Sciences, Temple University vasilis@cis.temple.edu ABSTRACT Is there a way to determine the performance of a Vector ..."
Abstract
 Add to MetaCart
Krishna Kumaraswamy Center for Automated Learning and Discovery, Carnegie Mellon University skkumar@cs.cmu.edu Vasileios Megalooikonomou Department of Computer and Information Sciences, Temple University vasilis@cis.temple.edu ABSTRACT Is there a way to determine the performance of a Vector Quantization method? Is there a fast method for determining the performance of a Vector Quantization method? In this paper, we show the relationship between the concepts of Vector Quantization and intrinsic "Fractal" Dimension. We derive a formula predicting the error rate of a Vector Quantization method, given the fractal dimension of the data set. We show that our result is true on synthetic as well as on real data sets. Also, we discuss how we can use our result for better, faster use in several data mining tasks.
Fractal Dimension for Data Mining
"... In this project, we introduce the concept of intrinsic "fractal" dimension of a data set and show how this can be used to aid in several data mining tasks. We are interested in answering questions about the performance of a method and also in comparing between the methods quickly. In particular, ..."
Abstract
 Add to MetaCart
In this project, we introduce the concept of intrinsic "fractal" dimension of a data set and show how this can be used to aid in several data mining tasks. We are interested in answering questions about the performance of a method and also in comparing between the methods quickly. In particular, we discuss two specific problems  dimensionality reduction and vector quantization. In each of these problems, we show how the performance of a method is related to the fractal dimension of the data set. Using real and synthetic data sets, we validate these relationships and show how we can use this for faster evaluation and comparison of the methods.