Results 1 -
7 of
7
Query optimization in database systems
- ACM Computing Surveys
, 1984
"... Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast imple ..."
Abstract
-
Cited by 194 (0 self)
- Add to MetaCart
Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast implementations of basic operations, and combinatorial or heuristic algorithms for generating alternative access plans and choosing among them. These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries. In addition, nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed. The focus, however, is on query optimization in centralized database systems.
Interleaving A Join Sequence With Semijoins In Distributed Query Processing
- IEEE Transactions on Parallel and Distributed Systems
, 1992
"... In distributed query processing, the conventional approach to reduce the amount of data transmission is to first apply a sequence of semijoins as "reducers" and then ship the resultant relations to the final site to carry out the join operations. Recently, it has been shown that the approach of appl ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
In distributed query processing, the conventional approach to reduce the amount of data transmission is to first apply a sequence of semijoins as "reducers" and then ship the resultant relations to the final site to carry out the join operations. Recently, it has been shown that the approach of applying a combination of joins and semijoins as reducers can lead to substantially larger reduction on data transmission required. In this paper, we develop an efficient heuristic approach to determine an effective sequence of semijoin and join reducers. Semijoins whose execution will reduce the amount of data transmission required to perform a join sequence are termed beneficial semijoins for that join sequence. Note that beneficial semijoins include the conventional profitable semijoins and the gainful semijoins that are not profitable themselves but become beneficial due to the inclusion of join reducers. This type of dependency between semijoin and join reducers complicates the identificati...
A state transition model for distributed query processing
- ACM Transactions on Database Systems
, 1986
"... A state transition model for the optimization of query processing in a distributed database system is presented. The problem is parametrized by means of a state describing the amount of processing that has been performed at each site where the database is located. A state transition occurs each time ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A state transition model for the optimization of query processing in a distributed database system is presented. The problem is parametrized by means of a state describing the amount of processing that has been performed at each site where the database is located. A state transition occurs each time a new join or semijoin is executed. Dynamic programming is used to compute recursively the costs of the states and the globally optimal solution, taking into account communication and local processing costs. The state transition model is general enough to account for the possibility of parallel processing among the various sites, as well as for redundancy in the database. The model also permits significant reductions of the necessary computations by taking advantage of simple additivity and site-uniformity properties of a cost model, and of clever strategies that improve on the basic dynamic programming algorithm.
Scalable Multi-Query Optimization for Exploratory Queries over Federated Scientific Databases
"... The diversity and large volumes of data processed in the Natural Sciences today has led to a proliferation of highlyspecialized and autonomous scientific databases with inherent and often intricate relationships. As a user-friendly method for querying this complex, ever-expanding network of sources ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The diversity and large volumes of data processed in the Natural Sciences today has led to a proliferation of highlyspecialized and autonomous scientific databases with inherent and often intricate relationships. As a user-friendly method for querying this complex, ever-expanding network of sources for correlations, we propose exploratory queries. Exploratory queries are loosely-structured, hence requiring only minimal user knowledge of the source network. Evaluating an exploratory query usually involves the evaluation of many distributed queries. As the number of such distributed queries can quickly become large, we attack the optimization problem for exploratory queries by proposing several multi-query optimization algorithms that compute a global evaluation plan while minimizing the total communication cost, a key bottleneck in distributed settings. The proposed algorithms are necessarily heuristics, as computing an optimal global evaluation plan is shown to be np-hard. Finally, we present an implementation of our algorithms, along with experiments that illustrate their potential not only for the optimization of exploratory queries, but also for the multiquery optimization of large batches of standard queries. 1.
Network-Aware Join Processing in Global-Scale Database Federations
"... Abstract — We introduce join scheduling algorithms that employ a balanced network utilization metric to optimize the use of all network paths in a global-scale database federation. This metric allows algorithms to exploit excess capacity in the network, while avoiding narrow, long-haul paths. We giv ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract — We introduce join scheduling algorithms that employ a balanced network utilization metric to optimize the use of all network paths in a global-scale database federation. This metric allows algorithms to exploit excess capacity in the network, while avoiding narrow, long-haul paths. We give a twoapproximate, polynomial-time algorithm for serial (left-deep) join schedules. We also present extensions to this algorithm that explore parallel schedules, reduce resource usage, and define tradeoffs between computation and network utilization. We evaluate these techniques within the SkyQuery federation of Astronomy databases using spatial-join queries submitted by SkyQuery’s users. Experiments show that our algorithms realize near-optimal network utilization with minor computational overhead. I.
BioScout: A Life-Science Query Monitoring System
"... Scientific data are available through an increasing number of heterogeneous, independently evolving, sources. Although the sources themselves are independently evolving, the data stored in them are not. There exist inherent and intricate relationships between the distributed data-sets and scientists ..."
Abstract
- Add to MetaCart
Scientific data are available through an increasing number of heterogeneous, independently evolving, sources. Although the sources themselves are independently evolving, the data stored in them are not. There exist inherent and intricate relationships between the distributed data-sets and scientists are routinely required to write distributed queries in this setting. Being nonexperts in computer science, the scientists are faced with two major challenges: (i) How to express such distributed queries. This is a non-trivial task, even if we assume that scientists are familiar with query languages like SQL. Such queries can get arbitrarily complex as more sources are considered; (ii) How to efficiently evaluate such distributed queries. An efficient evaluation must account for batches of hundreds (or even thousands) of submitted queries and must optimize all of them as a whole. In this demo, we focus on the biological domain for illustration purposes (our solutions are applicable to other scientific domains) and we present a system, called BioScout, that offers solutions in both of the above challenges. In more detail, we demonstrate the following functionality: (i) in BioScout, scientists draw their queries graphically, resulting in a query graph. The scientist is unaware of the query language used or of any optimization issues. Given the query graph, the system is able to generate, as a first step, an optimal query plan for the submitted query; (ii) BioScout uses four different strategies to combine the optimal query plans of individual queries to generate a global query plan for all the submitted queries. In the demo, we illustrate graphically how each of the four strategies works. 1.
GBO Preliminary Research Proposal
"... Gray and Szalay [1] documented the data avalanche problem in the sciences in which improvements in physical instruments and better data pipelines lead to an exponential growth in data size. Paralleling this exponential trend is the accumulation ..."
Abstract
- Add to MetaCart
Gray and Szalay [1] documented the data avalanche problem in the sciences in which improvements in physical instruments and better data pipelines lead to an exponential growth in data size. Paralleling this exponential trend is the accumulation

