Results 1 -
6 of
6
Query Optimization for Semistructured Data
, 1997
"... With the emerging prevalence of semistructured data -- data that may be irregular or incomplete -- it is important to develop efficient query processing techniques for such data. This paper describes the query processor of Lore, a DBMS for semistructured data, and focuses particularly on the cost-ba ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
With the emerging prevalence of semistructured data -- data that may be irregular or incomplete -- it is important to develop efficient query processing techniques for such data. This paper describes the query processor of Lore, a DBMS for semistructured data, and focuses particularly on the cost-based query optimization techniques we have developed and implemented for a semistructured environment. While all of the usual problems associated with cost-based query optimization apply to semistructured data as well, a number of additional problems arise, suchasvastly different query execution strategies for different semistructured databases, more complicated notions of database statistics, and novel uses of indexing. Weintroduce very flexible logical query plans that can be transformed into a wide varietyofphysical plans, define appropriate database statistics and a cost model, and describe plan enumeration including heuristics for reducing the search space. Our optimizer is fully implemented for most of the Lore query language, and preliminary performance results are reported.
Interleaving A Join Sequence With Semijoins In Distributed Query Processing
- IEEE Transactions on Parallel and Distributed Systems
, 1992
"... In distributed query processing, the conventional approach to reduce the amount of data transmission is to first apply a sequence of semijoins as "reducers" and then ship the resultant relations to the final site to carry out the join operations. Recently, it has been shown that the approach of appl ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
In distributed query processing, the conventional approach to reduce the amount of data transmission is to first apply a sequence of semijoins as "reducers" and then ship the resultant relations to the final site to carry out the join operations. Recently, it has been shown that the approach of applying a combination of joins and semijoins as reducers can lead to substantially larger reduction on data transmission required. In this paper, we develop an efficient heuristic approach to determine an effective sequence of semijoin and join reducers. Semijoins whose execution will reduce the amount of data transmission required to perform a join sequence are termed beneficial semijoins for that join sequence. Note that beneficial semijoins include the conventional profitable semijoins and the gainful semijoins that are not profitable themselves but become beneficial due to the inclusion of join reducers. This type of dependency between semijoin and join reducers complicates the identificati...
A state transition model for distributed query processing
- ACM Transactions on Database Systems
, 1986
"... A state transition model for the optimization of query processing in a distributed database system is presented. The problem is parametrized by means of a state describing the amount of processing that has been performed at each site where the database is located. A state transition occurs each time ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A state transition model for the optimization of query processing in a distributed database system is presented. The problem is parametrized by means of a state describing the amount of processing that has been performed at each site where the database is located. A state transition occurs each time a new join or semijoin is executed. Dynamic programming is used to compute recursively the costs of the states and the globally optimal solution, taking into account communication and local processing costs. The state transition model is general enough to account for the possibility of parallel processing among the various sites, as well as for redundancy in the database. The model also permits significant reductions of the necessary computations by taking advantage of simple additivity and site-uniformity properties of a cost model, and of clever strategies that improve on the basic dynamic programming algorithm.
Fusion Queries over Internet Databases
- In Proc. of the Conf. on Extending Database Technology (EDBT
, 1997
"... Fusion queries search for information integrated from distributed, autonomous sources over the Internet. In this context, data is not cleanly fragmented as in traditional distributed databases, and the number of sources participating in a typical query is large. We investigate techniques for efficie ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Fusion queries search for information integrated from distributed, autonomous sources over the Internet. In this context, data is not cleanly fragmented as in traditional distributed databases, and the number of sources participating in a typical query is large. We investigate techniques for efficient processing of fusion queries. First, we focus on a very wide class of query plans that capture the spirit of many techniques usually considered in existing systems. We show how to efficiently find, under various realistic scenarios, good query plans within this large class. We evaluate the performance of these plans and provide additional heuristics that, by considering plans outside our target class of plans, yield further performance improvements. 1 Introduction In distributed information systems on the Internet, one often encounters sources that provide incomplete and possibly overlapping information on a set of entities. A fusion query searches over these entities, looking for ones t...
Network-Aware Join Processing in Global-Scale Database Federations
"... Abstract — We introduce join scheduling algorithms that employ a balanced network utilization metric to optimize the use of all network paths in a global-scale database federation. This metric allows algorithms to exploit excess capacity in the network, while avoiding narrow, long-haul paths. We giv ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract — We introduce join scheduling algorithms that employ a balanced network utilization metric to optimize the use of all network paths in a global-scale database federation. This metric allows algorithms to exploit excess capacity in the network, while avoiding narrow, long-haul paths. We give a twoapproximate, polynomial-time algorithm for serial (left-deep) join schedules. We also present extensions to this algorithm that explore parallel schedules, reduce resource usage, and define tradeoffs between computation and network utilization. We evaluate these techniques within the SkyQuery federation of Astronomy databases using spatial-join queries submitted by SkyQuery’s users. Experiments show that our algorithms realize near-optimal network utilization with minor computational overhead. I.
GBO Preliminary Research Proposal
"... Gray and Szalay [1] documented the data avalanche problem in the sciences in which improvements in physical instruments and better data pipelines lead to an exponential growth in data size. Paralleling this exponential trend is the accumulation ..."
Abstract
- Add to MetaCart
Gray and Szalay [1] documented the data avalanche problem in the sciences in which improvements in physical instruments and better data pipelines lead to an exponential growth in data size. Paralleling this exponential trend is the accumulation

