Results 1 - 10
of
67
Improved Histograms for Selectivity Estimation of Range Predicates
, 1996
"... Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspec ..."
Abstract
-
Cited by 211 (20 self)
- Add to MetaCart
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance. 1 Introduction...
Query Optimization
, 1996
"... Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the typ ..."
Abstract
-
Cited by 102 (2 self)
- Add to MetaCart
Imagine yourself standing in front of an exquisite buffet filled with numerous delicacies. Your goal is to try them all out, but you need to decide in what order. What exchange of tastes will maximize the overall pleasure of your palate? Although much less pleasurable and subjective, that is the type of problem that query optimizers are called to solve. Given a query, there are many plans that a database management system (DBMS) can follow to process it and produce its answer. All plans are equivalent in terms of their final output but vary in their cost, i.e., the amount of time that they need to run. What is the plan that needs the least amount of time? Such query optimization is absolutely necessary in a DBMS. The cost difference between two alternatives can be enormous. For example, consider the following database schema, which will be...
The LDL System Prototype
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... The LDL system provides a declarative logic-based language and integrates relational database and logic programming technologies so as to support advanced data and knowledge-based applications. This paper contains a comprehensive overview of the system and contains a description of the LDL language ..."
Abstract
-
Cited by 80 (1 self)
- Add to MetaCart
The LDL system provides a declarative logic-based language and integrates relational database and logic programming technologies so as to support advanced data and knowledge-based applications. This paper contains a comprehensive overview of the system and contains a description of the LDL language and the compilation techniques employed to translate LDL queries into target queries on the stored data. The paper further contains a description of the architecture and runtime environment of the system and the optimization techniques employed in order to improve the performance and assure the safety of the compiled queries. The paper concludes with an account of the experience gained so far with the system, and discusses application areas where the LDL approach appears to be particularly effective.
Heuristic and Randomized Optimization for the Join Ordering Problem
- VLDB Journal
, 1997
"... Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximate solutions for optimizing join orders are studied since ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximate solutions for optimizing join orders are studied since traditional dynamic programming techniques are not appropriate for complex problems. First, two possible solution spaces, the space of left-deep and bushy processing trees, respectively, are evaluated from a statistical point of view. The result is that the common limitation to leftdeep processing trees is only advisable for certain join graph types. Basically, optimizers from three classes are analysed: heuristic, randomized and genetic algorithms. Each one is extensively scrutinized with respect to its working principle and its fitness for the desired application. It turns out that randomized and genetic algorithms are well suited for optimizing join expressions. They generate solutions of...
Parallel Query Processing
- ACM Computing Surveys
, 1993
"... With relations growing larger and queries becoming more complex, parallel query processing is an increasingly attractive option for improving the performance of database systems. The objective of this paper is to examine the various issues encountered in parallel query processing and the techniques ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
With relations growing larger and queries becoming more complex, parallel query processing is an increasingly attractive option for improving the performance of database systems. The objective of this paper is to examine the various issues encountered in parallel query processing and the techniques available for addressing these issues. The focus of the paper is on the join operation with both sort-merge join and hash joins being considered. Three types of parallelism can be exploited, namely intra-operator, inter-operator, and inter-query parallelism. In intra-operator parallelism the major issue is task creation, and the objective is to split a join operation into tasks in a manner such that the load can be spread evenly across a given number of processors. This is a challenge when the values on the join attribute are not uniformly distributed. Inter-operator parallelism can be achieved either through parallel execution of independent operations or through pipelining. In either case,...
Optimization techniques for queries with expensive methods
- ACM Transactions on Database Systems (TODS
, 1998
"... Object-Relational database management systems allow knowledgeable users to de ne new data types, as well as new methods (operators) for the types. This exibility produces an attendant complexity, which must be handled in new ways for an Object-Relational database management system to be e cient. In ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
Object-Relational database management systems allow knowledgeable users to de ne new data types, as well as new methods (operators) for the types. This exibility produces an attendant complexity, which must be handled in new ways for an Object-Relational database management system to be e cient. In this paper we study techniques for optimizing queries that contain time-consuming methods. The focus of traditional query optimizers has been on the choice of join methods and orders; selections have been handled by \pushdown " rules. These rules apply selections in an arbitrary order before as many joins as possible, using the assumption that selection takes no time. However, users of Object-Relational systems can embed complex methods in selections. Thus selections may take signi cant amounts of time, and the query optimization model must be enhanced. In this paper, we carefully de ne a query cost framework that incorporates both selectivity and cost estimates for selections. We develop an algorithm called Predicate Migration, and prove that it produces optimal plans for queries with expensive methods. We then describe our implementation of Predicate Migration in the commercial Object-Relational database management system Illustra, and discuss practical issues that a ect our earlier assumptions. We compare Predicate Migration to a variety of simpler optimization techniques, and demonstrate that Predicate Migration is the best general solution to date. The alternative techniques we presentmaybe useful for constrained workloads.
Parallel Evaluation of Multi-Join Queries
- In Proc. ACM SIGMOD Int'l. Conf
, 1995
"... A number of execution strategies for parallel evaluation of multi-join queries have been proposed in the literature; their performance was evaluated by simulation. In this paper we give a comparative performance evaluation of four execution strategies by implementing all of them on the same parallel ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
A number of execution strategies for parallel evaluation of multi-join queries have been proposed in the literature; their performance was evaluated by simulation. In this paper we give a comparative performance evaluation of four execution strategies by implementing all of them on the same parallel database system, PRISMA/DB. Experiments have been done up to 80 processors. The basic strategy is to first determine an execution schedule with minimum total cost and then parallelize this schedule with one of the four execution strategies. These strategies, coming from the literature, are named: Sequential Parallel, Synchronous Execution, Segmented Right-Deep, and Full Parallel. Based on the experiments clear guidelines are given when to use which strategy. 1
Optimizing Large Join Queries in Mediation Systems
- International Conference on Database Theory (ICDT
, 1999
"... . In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the probl ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
. In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the problem of finding feasible and efficient query plans for mediator systems. We consider conjunctive queries on mediators and model the source capabilities through attribute-binding adornments. We use a simple cost model that focuses on the major costs in mediation systems, those involved with sending queries to sources and getting answers back. Under this metric, we develop two algorithms for source query sequencing -- one based on a simple greedy strategy and another based on a partitioning scheme. The first algorithm produces optimal plans in some scenarios, and we show a linear bound on its worst case performance when it misses optimal plans. The second algorithm generates optimal plans in mor...
Iterative Dynamic Programming: A New Class of Query Optimization Algorithms
- ACM Trans. on Database Systems
, 1998
"... The query optimizer is one of the most important components of a database system. Most commercial query optimizers today are based on a dynamic-programming algorithm, as proposed in [SAC+79]. While this algorithm produces good optimization results (i.e., good plans), its high complexity can be prohi ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
The query optimizer is one of the most important components of a database system. Most commercial query optimizers today are based on a dynamic-programming algorithm, as proposed in [SAC+79]. While this algorithm produces good optimization results (i.e., good plans), its high complexity can be prohibitive if complex queries need to be processed, new query execution techniques need to be integrated, or in certain programming environments (e.g., distributed database systems). In this paper, we present and thoroughly evaluate a new class of query optimization algorithms that are based on a principle that we call iterative dynamic programming, or IDP for short. IDP has several important advantages: First, IDP-algorithms produce the best plans of all known algorithms in situations in which dynamic programming is not viable because of its high complexity. Second, some IDP variants are adaptive and produce as good plans as dynamic programming if dynamic programming is viable an...
Multiway Spatial Joins
- ACM Transactions on Database Systems (TODS
, 2001
"... Due to the evolution of Geographical Information Systems, large collections of spatial data having various thematic contents are currently available. As a result, the interest of users is not limited to simple spatial selections and joins, but complex query types that implicate numerous spatial inpu ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
Due to the evolution of Geographical Information Systems, large collections of spatial data having various thematic contents are currently available. As a result, the interest of users is not limited to simple spatial selections and joins, but complex query types that implicate numerous spatial inputs become more common. Although several algorithms have been proposed for computing the result of pairwise spatial joins, limited work exists on processing and optimization of multiway spatial joins. In this article, we review pairwise spatial join algorithms and show how they can be combined for multiple inputs. In addition, we explore the application of synchronous traversal (ST), a methodology that processes synchronously all inputs without producing intermediate results. Then, we integrate the two approaches in an engine that includes ST and pairwise algorithms, using dynamic programming to determine the optimal execution plan. The results show that, in most cases, multiway spatial joins are best processed by combining ST with pairwise methods. Finally, we study the optimization of very large queries by employing randomized search algorithms.

