Results 1 - 10
of
55
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 182 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Heuristic and Randomized Optimization for the Join Ordering Problem
- VLDB Journal
, 1997
"... Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximate solutions for optimizing join orders are studied since ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximate solutions for optimizing join orders are studied since traditional dynamic programming techniques are not appropriate for complex problems. First, two possible solution spaces, the space of left-deep and bushy processing trees, respectively, are evaluated from a statistical point of view. The result is that the common limitation to leftdeep processing trees is only advisable for certain join graph types. Basically, optimizers from three classes are analysed: heuristic, randomized and genetic algorithms. Each one is extensively scrutinized with respect to its working principle and its fitness for the desired application. It turns out that randomized and genetic algorithms are well suited for optimizing join expressions. They generate solutions of...
Parallel Query Processing
- ACM Computing Surveys
, 1993
"... With relations growing larger and queries becoming more complex, parallel query processing is an increasingly attractive option for improving the performance of database systems. The objective of this paper is to examine the various issues encountered in parallel query processing and the techniques ..."
Abstract
-
Cited by 54 (0 self)
- Add to MetaCart
With relations growing larger and queries becoming more complex, parallel query processing is an increasingly attractive option for improving the performance of database systems. The objective of this paper is to examine the various issues encountered in parallel query processing and the techniques available for addressing these issues. The focus of the paper is on the join operation with both sort-merge join and hash joins being considered. Three types of parallelism can be exploited, namely intra-operator, inter-operator, and inter-query parallelism. In intra-operator parallelism the major issue is task creation, and the objective is to split a join operation into tasks in a manner such that the load can be spread evenly across a given number of processors. This is a challenge when the values on the join attribute are not uniformly distributed. Inter-operator parallelism can be achieved either through parallel execution of independent operations or through pipelining. In either case,...
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins
- In Proc. Int'l. Conf. on Very Large Data Bases
, 1992
"... In this paper, we explore the execution of pipelined hash joins in a multiprocessor-based database system. To improve the query execution, an innovative ap-proach on query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We fir ..."
Abstract
-
Cited by 47 (9 self)
- Add to MetaCart
In this paper, we explore the execution of pipelined hash joins in a multiprocessor-based database system. To improve the query execution, an innovative ap-proach on query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analyt-ical model for the execution of a pipeline segment, and then, in light of the model, develop heuristic schemes to determine the query execution plan based on a seg-mented right-deep tree so that the query can be ef-ficiently executed. As shown by our simulation, the proposed approach, without incurring additional over-head on plan execution, possesses more flexibility in query plan generation, and leads to query plans of sig-nificantly better performance than those achievable by the previous schemes using right-deep trees. 1
Rapid Bushy Join-order Optimization with Cartesian Products
- In Proc. of the ACM SIGMOD Conf. on Management of Data
, 1996
"... Query optimizers often limit the search space for join orderings, for example by excluding Cartesian products in subplans or by restricting plan trees to left-deep vines. Such exclusions are widely assumed to reduce optimization effort while minimally affecting plan quality. However, we show that se ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Query optimizers often limit the search space for join orderings, for example by excluding Cartesian products in subplans or by restricting plan trees to left-deep vines. Such exclusions are widely assumed to reduce optimization effort while minimally affecting plan quality. However, we show that searching the complete space of plans is more affordable than has been previously recognized, and that the common exclusions may be of little benefit. We start by presenting a Cartesian product optimizer that requires at most a few seconds of workstation time to search the space of bushy plans for products of up to 15 relations. Building on this result, we present a join-order optimizer that achieves a similar level of performance, and retains the ability to include Cartesian products in subplans wherever appropriate. The main contribution of the paper is in fully separating join-order enumeration from predicate analysis, and in showing that the former problem in particular can be solved swift...
Mariposa: A new architecture for distributed data
- Proc. 10th Int. Conf. on Data Engineering
, 1994
"... We describe the design of Mariposa, an experimental distributed data management system that provides high performance in an environment of high data mobility and heterogeneous host capabilities. The Mariposa design unifies the approaches taken by distributed file systems and distributed databases. I ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
We describe the design of Mariposa, an experimental distributed data management system that provides high performance in an environment of high data mobility and heterogeneous host capabilities. The Mariposa design unifies the approaches taken by distributed file systems and distributed databases. In addition, Mariposa provides a general, flexible platform for the development of new algorithms for distributed query optimization, storage management, and scalable data storage structures. This flexibility is primarily due to a unique rule-based design that permits autonomous, local-knowledge decisions to be made regarding data placement, query execution location, and storage management. 1.
Multi-join optimization for symmetric multiprocessors
- Int. Conf. on VLDB
, 1993
"... This paper looks at the problem of multi-join query optimization for symmetric multiproceasore. Optimizrt-lion algorithms based on dynamic programming and greedy heuristics are described that, unlike traditional methods, include memory resources and pipelining in their cost model. An analytical mode ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
This paper looks at the problem of multi-join query optimization for symmetric multiproceasore. Optimizrt-lion algorithms based on dynamic programming and greedy heuristics are described that, unlike traditional methods, include memory resources and pipelining in their cost model. An analytical model is presented and used to compare the quality of plans produced by each optimization algorithm. Experimental results show that, while dynamic programming produces the be & plans, simple heuristics often do nearly as well. The came results are also used to highlight the advan-tages of bushy execution trees over more restricted tree shapes. 1
The Complexity of Transformation-Based Join Enumeration
, 1997
"... Query optimizers that explore a search space exhaustively using transformation rules usually apply all possible rules on each alternative, and stop when no new information is produced. A memoizing structure was proposed in [McK93] to improve the re-use of common subexpression, thus improving the eff ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
Query optimizers that explore a search space exhaustively using transformation rules usually apply all possible rules on each alternative, and stop when no new information is produced. A memoizing structure was proposed in [McK93] to improve the re-use of common subexpression, thus improving the efficiency of the search considerably. However, a question that remained open is, what is the complexity of the transformation-based enumeration process ? In particular, with n the number of relations, does it achieve the O(3 n ) lower bound established by [OL90]? In this paper we examine the problem of duplicates, in transformation-based enumeration. In general, different sequences of transformation rules may end up deriving the same element, and the optimizer must detect and discard these duplicate elements generated by multiple paths. We show that the usual commutativity/associativity rules for joins generate O(4 n ) duplicate opera- Permission to copy without fee all or part of this ma...
Optimizing Join Orders
, 1993
"... Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximative solutions for optimizing join orders are studied ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Recent developments in database technology, such as deductive database systems, have given rise to the demand for new, cost-effective optimization techniques for join expressions. In this paper many different algorithms that compute approximative solutions for optimizing join orders are studied since traditional dynamic programming techniques are not appropriate for complex problems. First, two possible solution spaces, the space of left-deep and bushy processing trees, respectively, are evaluated from a statistical point of view. The result is that the common limitation to left-deep processing trees is, from a purely statistical point of view, only advisable for certain cost models. Basically, optimizers from three classes are analysed: heuristic, randomized and genetic algorithms. Each one is extensively scrutinized with respect to its working principle and its fitness for the desired application. It turns out that randomized and genetic algorithms are well suited for op...
On the Complexity of Generating Optimal Plans with Cross Products (Extended Abstract)
- In PODS Conference
, 1997
"... In modern advanced database systems the optimizer is often faced with the problem of finding optimal evaluation strategies for queries involving a large number of joins. Examples are queries generated by deductive database systems and path expressions in object-oriented database systems. The be ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
In modern advanced database systems the optimizer is often faced with the problem of finding optimal evaluation strategies for queries involving a large number of joins. Examples are queries generated by deductive database systems and path expressions in object-oriented database systems. The best plan can be found in the very large search space of bushy trees where plans are allowed to contain cross products. A general question arises: For which (sub-) problems can we expect to find polynomial algorithms generating the best plan? We attack this question from both ends of the spectrum. First, we show that we cannot expect to find any polynomial algorithm for any subproblem as long as optimal bushy trees are to be generated. More specifically, we show that the problem is NP-hard independent of the query graph.

