Results 1 - 10
of
41
Query optimization in database systems
- ACM Computing Surveys
, 1984
"... Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast imple ..."
Abstract
-
Cited by 194 (0 self)
- Add to MetaCart
Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast implementations of basic operations, and combinatorial or heuristic algorithms for generating alternative access plans and choosing among them. These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries. In addition, nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed. The focus, however, is on query optimization in centralized database systems.
An overview of query optimization in relational systems
- In PODS
, 1998
"... There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased an ..."
Abstract
-
Cited by 99 (1 self)
- Add to MetaCart
There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to focus primarily on the optimization of SQL queries in relational database systems and present my biased and incomplete view of this field. The goal of this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area. I would like to apologize to the many contributors in this area whose work I have failed to explicitly acknowledge due to oversight or lack of space. I take the liberty of trading technical precision for ease of presentation. 2.
Data allocation in distributed database systems
- ACM Transactions on Database Systems
, 1988
"... The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules t ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules that contain transmissions between objects to produce the result. A model that makes it possible to compare the cost of allocations is presented, the cost can be computed for different cost functions and for processing schedules produced by arbitrary query processing algorithms. For minimizing the total transmission cost, a method is proposed to determine the fragments to be allocated from the relations in the conceptual schema and the queries and updates executed by the users. For the same cost function, the complexity of the data allocation problem is investigated. Methods for obtaining optimal and heuristic solutions under various ways of computing the cost of an allocation are presented and compared. Two different approaches to the allocation management problem are presented and their merits are discussed.
Optimizing Large Join Queries in Mediation Systems
- International Conference on Database Theory (ICDT
, 1999
"... . In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the probl ..."
Abstract
-
Cited by 37 (11 self)
- Add to MetaCart
. In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the problem of finding feasible and efficient query plans for mediator systems. We consider conjunctive queries on mediators and model the source capabilities through attribute-binding adornments. We use a simple cost model that focuses on the major costs in mediation systems, those involved with sending queries to sources and getting answers back. Under this metric, we develop two algorithms for source query sequencing -- one based on a simple greedy strategy and another based on a partitioning scheme. The first algorithm produces optimal plans in some scenarios, and we show a linear bound on its worst case performance when it misses optimal plans. The second algorithm generates optimal plans in mor...
Reformulating Query Plans For Multidatabase Systems
- IN PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT
, 1993
"... A practical heterogeneous, distributed multidatabase system must answer queries efficiently. Conventional query optimization techniques are not adequate here because these techniques are dependent on the database structure, and rely on limited information which is not sufficient in complicated mult ..."
Abstract
-
Cited by 21 (14 self)
- Add to MetaCart
A practical heterogeneous, distributed multidatabase system must answer queries efficiently. Conventional query optimization techniques are not adequate here because these techniques are dependent on the database structure, and rely on limited information which is not sufficient in complicated multidatabase queries. This paper presents an automated approach to reformulating query plans to improve the efficiency of multidatabase queries. This approach uses database abstractions, the knowledge about the contents of databases, to reformulate a query plan into less expensive but semantically equivalent one. We present two algorithms. The first algorithm reformulates subqueries to individual databases, the second algorithm extends the first one and reformulates the entire query plan. Empirical results show that the reformulations can provide significant savings with minimal overhead. The reformulation approach provides a global reduction in the amount of the intermediate data as well as local opt...
A survey of parallel execution strategies for transitive closure and logic programs
- DISTRIBUTED AND PARALLEL DATABASES
, 1993
"... An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particu ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple
Computing Complete Answers to Queries in the Presence of Limited Access Patterns
- Journal of VLDB
, 1999
"... In data applications such as information integration, there can be limited access patterns to relations, i.e., binding patterns require values to be specified for certain attributes in order to retrieve data from a relation. As a consequence, we cannot retrieve all tuples from these relations. In th ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
In data applications such as information integration, there can be limited access patterns to relations, i.e., binding patterns require values to be specified for certain attributes in order to retrieve data from a relation. As a consequence, we cannot retrieve all tuples from these relations. In this article we study the problem of computing the complete answer to a query, i.e., the answer that could be computed if all the tuples could be retrieved. A query is stable if for any instance of the relations in the query, its complete answer can be computed using the access patterns permitted by the relations. We study the problem of testing stability of various classes of queries, including conjunctive queries, unions of conjunctive queries, and conjunctive queries with arithmetic comparisons.
XRPC: Interoperable and Efficient Distributed XQuery
- In VLDB
, 2007
"... We propose XRPC, a minimal XQuery extension that enables distributed yet efficient querying of heterogeneous XQuery data sources. XRPC enhances the existing concept of XQuery functions with the Remote Procedure Call (RPC) paradigm. By calling out of an XQueryfor-loop to multiple destinations, and by ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
We propose XRPC, a minimal XQuery extension that enables distributed yet efficient querying of heterogeneous XQuery data sources. XRPC enhances the existing concept of XQuery functions with the Remote Procedure Call (RPC) paradigm. By calling out of an XQueryfor-loop to multiple destinations, and by calling functions that themselves perform XRPC calls, complex P2P communication patterns can be achieved. The XRPC extension is orthogonal to all XQuery features, including the XQuery Update Facility (XQUF). We provide formal semantics for XRPC that encompasses execution of both read-only and update queries. XRPC is also a network SOAP sub-protocol, that integrates seamlessly with web services and Service Oriented Architectures (SOA), and AJAX-based GUIs. A crucial feature of the protocol is bulk RPC, that allows remote execution of many different calls to the same procedure, using possibly a single network round-trip. The efficiency potential of XRPC is demonstrated via an open-source implementation in MonetDB/XQuery. We show, however, that XRPC is not system-specific: every XQuery data source can service XRPC calls using a wrapper. Since XQuery is a pure functional language, we can leverage techniques developed for functional query decomposition to rewrite data shipping queries into XRPC-based function shipping queries. Powerful distributed database techniques (such as semi-join optimizations) directly map on bulk RPC, opening up interesting future work opportunities. 1.
Query Optimization in Mobile Environments
- In Workshop on Foundations of Models and Languages for Data and Objects
, 1993
"... We consider the issue of optimizing queries for a distributed processing in mobile environment. An interesting characteristic of mobile machines is that they depend on battery as a source of energy which may not be substantial enough. Hence, the appropriate optimization criterion in a mobile environ ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We consider the issue of optimizing queries for a distributed processing in mobile environment. An interesting characteristic of mobile machines is that they depend on battery as a source of energy which may not be substantial enough. Hence, the appropriate optimization criterion in a mobile environment considers both resource utilization and energy consumption at the mobile client. In this scenario, the optimal plan for a query depends on the residual battery level of the mobile client and the load at the server. We approach this problem by compiling a query into a sequence of candidate plans, such that for any state of the client-server system, the optimal plan is one of the candidate plans. A general solution is proposed by adapting the partial order dynamic programming search algorithm [17, 18] (p.o dp) such that the coverset of the query is the set of candidate plans. We propose two novel algorithms, namely, the linear combinations algorithm and the linearset algorithm (referred t...
AmbientDB: relational query processing in a P2P network
- In Proceedings of the International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P), LNCS 2788
, 2003
"... Abstract. A new generation of applications running on a network of nodes, that share data on an ad-hoc basis, will benefit from data management services including powerful querying facilities. In this paper, we introduce the goals, assumptions and architecture of AmbientDB, a new peer-to-peer (P2P) ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Abstract. A new generation of applications running on a network of nodes, that share data on an ad-hoc basis, will benefit from data management services including powerful querying facilities. In this paper, we introduce the goals, assumptions and architecture of AmbientDB, a new peer-to-peer (P2P) DBMS prototype developed at CWI. Our focus is on the query processing facilities of AmbientDB, that are based on a three-level translation of a global query algebra into multiwave stream processing plans, distributed over an ad-hoc P2P network. We illustrate the usefulness of our system by outlining how it eases construction of a music player that generates intelligent playlists with collaborative filtering over distributed music logs. Finally, we show how the use of Distributed Hash Tables (DHT) at the basis of AmbientDB allows applications like the P2P music player to scale to large amounts of nodes. 1

