Results 1 - 10
of
21
Evaluating Top-k Queries over Web-Accessible Databases
- ACM TRANS. ON DATABASE SYSTEMS
, 2004
"... ... In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but ..."
Abstract
-
Cited by 172 (11 self)
- Add to MetaCart
... In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes sourceaccess parallelism to minimize query response time, while satisfying source-access constraints.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries
- In SIGMOD
, 2002
"... This paper addresses the problem of evaluating ranked top- queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model use ..."
Abstract
-
Cited by 100 (6 self)
- Add to MetaCart
This paper addresses the problem of evaluating ranked top- queries with expensive predicates. As major DBMSs now all support expensive user-defined predicates for Boolean queries, we believe such support for ranked queries will be even more important: First, ranked queries often need to model user-specific concepts of preference, relevance, or similarity, which call for dynamic user-defined functions. Second, middleware systems must incorporate external predicates for integrating autonomous sources typically accessible only by per-object queries. Third, fuzzy joins are inherently expensive, as they are essentially user-defined operations that dynamically associate multiple relations. These predicates, being dynamically defined or externally accessed, cannot rely on index mechanisms to provide zero-time sorted output, and must instead require per-object probe to evaluate. The current standard sort-merge framework for ranked queries cannot efficiently handle such predicates because it must completely probe all objects, before sorting and merging them to produce top- answers. To minimize expensive probes, we thus develop the formal principle of "necessary probes," which determines if a probe is absolutely required. We then propose Algorithm MPro which, by implementing the principle, is provably optimal with minimal probe cost. Further, we show that MPro can scale well and can be easily parallelized. Our experiments using both a real-estate benchmark database and synthetic datasets show that MPro enables significant probe reduction, which can be orders of magnitude faster than the standard scheme using complete probing.
Optimizing Queries over Multimedia Repositories
, 1996
"... Multimedia repositories and applications that retrieve multimedia information are becoming increasingly popular. In this paper, we study the problem of selecting objects from multimedia repositories, and show how this problem relates to the processing and optimization of selection queries in other c ..."
Abstract
-
Cited by 74 (8 self)
- Add to MetaCart
Multimedia repositories and applications that retrieve multimedia information are becoming increasingly popular. In this paper, we study the problem of selecting objects from multimedia repositories, and show how this problem relates to the processing and optimization of selection queries in other contexts, e.g., when some of the selection conditions are expensive user-defined predicates. We find that the problem has unique characteristics that lead to interesting new research questions and results. This article presents an overview of the results in [1]. An expanded version of that paper is in preparation [2]. 1 Query Model In this section we first describe the model that we use for querying multimedia repositories. Then, we briefly review related models for querying text and image repositories. 1.1 Our Query Model In our model, a multimedia repository consists of a set of multimedia objects, each with a distinct object identity. Each multimedia object has a set of attributes, like...
Optimization techniques for queries with expensive methods
- ACM Transactions on Database Systems (TODS
, 1998
"... Object-Relational database management systems allow knowledgeable users to de ne new data types, as well as new methods (operators) for the types. This exibility produces an attendant complexity, which must be handled in new ways for an Object-Relational database management system to be e cient. In ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
Object-Relational database management systems allow knowledgeable users to de ne new data types, as well as new methods (operators) for the types. This exibility produces an attendant complexity, which must be handled in new ways for an Object-Relational database management system to be e cient. In this paper we study techniques for optimizing queries that contain time-consuming methods. The focus of traditional query optimizers has been on the choice of join methods and orders; selections have been handled by \pushdown " rules. These rules apply selections in an arbitrary order before as many joins as possible, using the assumption that selection takes no time. However, users of Object-Relational systems can embed complex methods in selections. Thus selections may take signi cant amounts of time, and the query optimization model must be enhanced. In this paper, we carefully de ne a query cost framework that incorporates both selectivity and cost estimates for selections. We develop an algorithm called Predicate Migration, and prove that it produces optimal plans for queries with expensive methods. We then describe our implementation of Predicate Migration in the commercial Object-Relational database management system Illustra, and discuss practical issues that a ect our earlier assumptions. We compare Predicate Migration to a variety of simpler optimization techniques, and demonstrate that Predicate Migration is the best general solution to date. The alternative techniques we presentmaybe useful for constrained workloads.
Query Execution Techniques for Caching Expensive Methods
- In SIGMOD
, 1996
"... . Object-Relational and Object-Oriented DBMSs allow users to invoke time-consuming ("expensive") methods in their queries. When queries containing these expensive methods are run on data with duplicate values, time is wasted redundantly computing methods on the same value. This problem has been stud ..."
Abstract
-
Cited by 50 (8 self)
- Add to MetaCart
. Object-Relational and Object-Oriented DBMSs allow users to invoke time-consuming ("expensive") methods in their queries. When queries containing these expensive methods are run on data with duplicate values, time is wasted redundantly computing methods on the same value. This problem has been studied in the context of programming languages, where "memoization" is the standard solution. In the database literature, sorting has been proposed to deal with this problem. We compare these approachesalong with a third solution, a variant of unary hybrid hashing which we call Hybrid Cache. We demonstrate that Hybrid Cache always dominates memoization, and significantly outperforms sorting in many instances. This provides new insights into the tradeoff between hashing and sorting for unary operations. Additionally, our Hybrid Cache algorithm includes some new optimizations for unary hybrid hashing, which can be used for other applications such as grouping and duplicate elimination. We conclude...
Optimizing top-k selection queries over multimedia repositories
, 2003
"... Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated wi ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Further- more, unlike in the relational model, users may just want the k top-ranked objects for their selection queries, for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. In this paper, we investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repos- itory strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally with respect to our cost model and execution space when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus,
Translating OQL into Monoid Comprehensions -- Stuck with Nested Loops?
, 1996
"... This work tries to employ the monoid comprehension calculus --- which has proven to be an adequate framework to capture the semantics of modern object query languages featuring a family of collection types like sets, bags, and lists --- in a twofold manner: First, serving as a target language for ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
This work tries to employ the monoid comprehension calculus --- which has proven to be an adequate framework to capture the semantics of modern object query languages featuring a family of collection types like sets, bags, and lists --- in a twofold manner: First, serving as a target language for the translation of ODMG OQL queries. We review work done in this field and also give comprehension calculus equivalents for the recently introduced OQL 1.2 concepts. Second, we use monoid comprehensions as the formalism in which we try to find efficient execution methods working on a rich set of physical structures (including indices, vertical and horizontal decomposition, etc.). The main problem coming up here is the "nested-loop nature" of the calculus expressions. While these loop-based semantics for evaluating comprehensions at least provide a way for executing OQL queries, their execution is almost always much less efficient than alternative physical algorithms of the database e...
Evaluating topqueries over web-accessible databases
- ACM TODS’04
"... A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or “top ” � pages for the query. This top- � query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For ex ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or “top ” � pages for the query. This top- � query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user’s location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing such top- � queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this paper, we study how to process top-� queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present several algorithms for processing such queries, and evaluate them thoroughly using both synthetic and real web-accessible data. 1.
Bypassing Joins in Disjunctive Queries
, 1995
"... In this paper we develop a novel optimization strategy for disjunctive queries involving join predicates. This work is an extension of our previously published approach [KMPS94] for optimizing disjunctive selection predicates by generating two output streams from selection operators: a "true"-stream ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
In this paper we develop a novel optimization strategy for disjunctive queries involving join predicates. This work is an extension of our previously published approach [KMPS94] for optimizing disjunctive selection predicates by generating two output streams from selection operators: a "true"-stream for objects (tuples) satisfying the selection predicate and a "false"-stream for those objects not satisfying the predicate. Then, each stream undergoes an individual, "customized" optimization. Here, we extend the basic idea of [KMPS94] to disjunctive queries with join operators. Analogously to selections, we propose to generate two output streams from a join operator: one "true"-stream for pairs of objects (of the two input streams) that satisfy the join predicate and one "false"- stream for those pairs not satisfying it. In combination with the extended selection predicate processing, this provides a large potential for efficiently evaluating disjunctive queries because it allows to "byp...
Capability Sensitive Query Processing on Internet Sources
- IN PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING
, 1999
"... On the Internet, query processing capabilities of sources may be limited in diverse ways, and this makes answering even the simplest queries challenging. In this paper, we present a scheme called GenCompact for generating capability sensitive plans for selection queries. The generated query plans ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
On the Internet, query processing capabilities of sources may be limited in diverse ways, and this makes answering even the simplest queries challenging. In this paper, we present a scheme called GenCompact for generating capability sensitive plans for selection queries. The generated query plans may be better than what existing query processing systems produce for three reasons: (1) the sources are guaranteed to support the query plans; (2) the plans take full advantage of the source capabilities; and (3) the plans may be more efficient since a larger space of plans is examined. Even though GenCompact considers many plans, it is relatively efficient because it uses effective data structures and pruning rules. We study the optimality of the plans generated as well as the efficiency of the plan generation process.

