Results 1 - 10
of
35
Query optimization in database systems
- ACM Computing Surveys
, 1984
"... Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast imple ..."
Abstract
-
Cited by 194 (0 self)
- Add to MetaCart
Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logic-based and semantic transformations, fast implementations of basic operations, and combinatorial or heuristic algorithms for generating alternative access plans and choosing among them. These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries. In addition, nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed. The focus, however, is on query optimization in centralized database systems.
Practical Skew Handling in Parallel Joins
- IN PROCEEDINGS OF THE 18TH VLDB CONFERENCE
, 1992
"... We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each ..."
Abstract
-
Cited by 85 (8 self)
- Add to MetaCart
We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each specialized for a di erent degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate. We developed, implemented, and experimented with four new skew-handling parallel join algorithms; one, which wecall virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the rst reported skew-handling numbers from an actual implementation.
Query Processing in a System for Distributed Databases (SDD-1
- ACM Transactions on Database Systems
, 1981
"... Thii paper describes the techniques used to optimize relational queries in the SDD-1 distributed database system. Queries are submitted to SDD-1 in a high-level procedural language called Datalan-guage. Optimization begins by translating each Datalanguage query into a relational calculus form called ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
Thii paper describes the techniques used to optimize relational queries in the SDD-1 distributed database system. Queries are submitted to SDD-1 in a high-level procedural language called Datalan-guage. Optimization begins by translating each Datalanguage query into a relational calculus form called an envelope, which is essentially an aggregate-free QUEL query. This paper is primarily concerned with the optimization of envelopes. Envelopes are processed in two phases. The first phase executes relational operations at various sites of the distributed database in order to delimit a subset of the database that contains all data relevant to the envelope. This subset is called a reduction of the database. The second phase transmits the reduction to one designated site, and the query is executed locally at that site. The critical optimization problem is to perform the reduction phase efficiently. Success depends on designing a good repertoire of operators to use during this phase, and an effective algorithm for deciding which of these operators to use in processing a given envelope against a given database. The principal reduction operator that we employ is called a
Data allocation in distributed database systems
- ACM Transactions on Database Systems
, 1988
"... The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules t ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules that contain transmissions between objects to produce the result. A model that makes it possible to compare the cost of allocations is presented, the cost can be computed for different cost functions and for processing schedules produced by arbitrary query processing algorithms. For minimizing the total transmission cost, a method is proposed to determine the fragments to be allocated from the relations in the conceptual schema and the queries and updates executed by the users. For the same cost function, the complexity of the data allocation problem is investigated. Methods for obtaining optimal and heuristic solutions under various ways of computing the cost of an allocation are presented and compared. Two different approaches to the allocation management problem are presented and their merits are discussed.
Operator Placement for In-Network Stream Query Processing
- In Proc. the 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principle of Database Systems(PODS
, 2005
"... In sensor networks, data acquisition frequently takes place at low-capability devices. The acquired data is then transmitted through a hierarchy of nodes having progressively increasing network bandwidth and computational power. We consider the problem of executing queries over these data streams, p ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
In sensor networks, data acquisition frequently takes place at low-capability devices. The acquired data is then transmitted through a hierarchy of nodes having progressively increasing network bandwidth and computational power. We consider the problem of executing queries over these data streams, posed at the root of the hierarchy. To minimize data transmission, it is desirable to perform “in-network ” query processing: do some part of the work at intermediate nodes as the data travels to the root. Most previous work on in-network query processing has focused on aggregation and inexpensive filters. In this paper, we address in-network processing for queries involving possibly expensive conjunctive filters, and joins. We consider the problem of placing operators along the nodes of the hierarchy so that the overall cost of computation and data transmission is minimized. We show that the problem is tractable, give an optimal algorithm, and demonstrate that a simpler greedy operator placement algorithm can fail to find the optimal solution. Finally we define a number of interesting variations of the basic operator placement problem and demonstrate their hardness. 1
Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience
- IN VLDB '09: PROCEEDINGS OF THE VLDB ENDOWMENT
, 2009
"... Increasingly, organizations capture, transform and analyze enormous data sets. Prominent examples include internet companies and e-science. The Map-Reduce scalable dataflow paradigm has become popular for these applications. Its simple, explicit dataflow programming model is favored by some over the ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
Increasingly, organizations capture, transform and analyze enormous data sets. Prominent examples include internet companies and e-science. The Map-Reduce scalable dataflow paradigm has become popular for these applications. Its simple, explicit dataflow programming model is favored by some over the traditional high-level declarative approach: SQL. On the other hand, the extreme simplicity of Map-Reduce leads to much low-level hacking to deal with the many-step, branching dataflows that arise in practice. Moreover, users must repeatedly code standard operations such as join by hand. These practices waste time, introduce bugs, harm readability, and impede optimizations. Pig is a high-level dataflow system that aims at a sweet spot between SQL and Map-Reduce. Pig offers SQL-style high-level data manipulation constructs, which can be assembled in an explicit dataflow and interleaved with custom Map- and Reduce-style functions or executables. Pig programs are compiled into sequences of Map-Reduce jobs, and executed in the Hadoop Map-Reduce environment. Both Pig and Hadoop are open-source projects administered by the Apache Software Foundation. This paper describes the challenges we faced in developing Pig, and reports performance comparisons between Pig execution and raw Map-Reduce execution. 1.
InterViso: dealing with the complexity of federated database access." The VLDB Journal, vol 4 (2), pp 287-318. Springer-Verlag New York, Inc. http://dx.doi.org/10.1007/BF01237922 TIPSTER (Website). The TIPSTER text programme. http://wwwnlpir.nist.gov/rela
- Web Semantics: Science, Services and Agents on the World Wide Web
, 1995
"... Abstract. Connectivity products are finally available to provide the "highways" between computers containing data. IBM has provided strong validation of the concept with their "Information Warehouse. " DBMS vendors are providing gateways into their products, and SQL is being retrofitted on many olde ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Abstract. Connectivity products are finally available to provide the "highways" between computers containing data. IBM has provided strong validation of the concept with their "Information Warehouse. " DBMS vendors are providing gateways into their products, and SQL is being retrofitted on many older DBMSs to make it easier to access data from standard 4GL products and application development systems. The next step needed for data integration is to provide (1) a common data dictionary with a conceptual schema across the data to mask the many differences that occur when databases are developed independently and (2) a server that can access and integrate the databases using information from the data dictionary. In this article, we discuss InterViso, one of the first commercial federated database products. InterViso is based on Mermaid, which was developed at SDC and Unisys (Templeton et al., 1987b). It provides a value added layer above connectivity products to handle views across databases, schema translation, and transaction management. Key Words. Federated database, database integration, data warehouse. 1.
Using Shared Virtual Memory for Parallel Join Processing
- ACM-SIGMOD Int. Conf
, 1993
"... In this paper, we show that shared virtual memory, in a shared-nothing multiprocessor, facilitates the design and implementation of parallel join processing algorithms that perform significantly better in the presence of skew than previously proposed parallel join processing algorithms. We propose t ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
In this paper, we show that shared virtual memory, in a shared-nothing multiprocessor, facilitates the design and implementation of parallel join processing algorithms that perform significantly better in the presence of skew than previously proposed parallel join processing algorithms. We propose two variants of an algorithm for parallel join processing using shared virtual memory, and perform a detailed simulation to investigate their performance. The algorithm is unique in that it employs both the shared virtual memory paradigm and the message-passing paradigm used by current sharednothing parallel database systems. The implementation of the algorithm requires few modifications to existing shared-nothing parallel database systems. 1 Introduction The next generation of shared-nothing multiprocessors are expected to be equipped with shared virtual memory (henceforth called SVM) providing a globally shared address space (e.g. the Intel Paragon product literature states that it will pr...
Groupwise processing of relational queries
- Proceedings of the 1997 VLDB Conference
, 1997
"... In this paper, we define and examine a particu-lar class of queries called group queries. Group queries are natural queries in many decision-support applications. The main characteristic of a group query is that it can be executed in a group-by-group fashion. In other words, the underlying relation( ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
In this paper, we define and examine a particu-lar class of queries called group queries. Group queries are natural queries in many decision-support applications. The main characteristic of a group query is that it can be executed in a group-by-group fashion. In other words, the underlying relation(s) can be partitioned (based on some set of attributes) into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten-tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some per-formance results for group queries expressed in standard SQL, comparing a commercial database system with our optimized plan on top of the same commercial system. These results indicate that there are significant potential performance improvements.
Efficient Query Processing for Data Integration
, 2002
"... A major problem today is that important data is scattered throughout dozens of separately evolved data sources, in a form that makes the "big picture" difficult to obtain. Data integration presents a unified virtual view of all data within a domain, allowing the user to pose queries across the compl ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
A major problem today is that important data is scattered throughout dozens of separately evolved data sources, in a form that makes the "big picture" difficult to obtain. Data integration presents a unified virtual view of all data within a domain, allowing the user to pose queries across the complete integrated schema. This dissertation addresses the performance needs...

