Results 1 -
9 of
9
Parallel database systems: the future of high performance database systems
- Communications of the ACM
, 1992
"... Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper ..."
Abstract
-
Cited by 466 (8 self)
- Add to MetaCart
Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems. 1.
The Gamma database machine project
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the arc ..."
Abstract
-
Cited by 203 (27 self)
- Add to MetaCart
This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the architecture to be scaled to 100s of processors. First, all relations are horizontally partitioned across multiple disk drives enabling relations to be scanned in parallel. Second, novel parallel algorithms based on hashing are used to implement the complex relational operators such as join and aggregate functions. Third, dataflow scheduling techniques are used to coordinate multioperator queries. By using these techniques it is possible to control the execution of very complex queries with minimal coordination- a necessity for configurations involving a very large number of processors. In addition to describing the design of the Gamma software, a thorough performance evaluation of the iPSC/2 hypercube version of Gamma is also presented. In addition to measuring the effect of relation size and indices on the response time for selection, join, aggregation, and update queries, we also analyze the performance of Gamma relative to the number of processors employed when the sizes of the input relations are kept constant (speedup) and when the sizes of the input relations are increased proportionally to the number of processors (scaleup). The speedup results obtained for both selection and join queries are linear; thus, doubling the number of processors
Parallel Database Systems: The Future of Database Processing or a Passing Fad?
- SIGMOD RECORD
, 1991
"... Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews th ..."
Abstract
-
Cited by 46 (6 self)
- Add to MetaCart
Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems.
Partitioning Similarity Graphs: A Framework for Declustering Problems
- Information Systems Journal
, 1996
"... Declustering problems are well-known in the databases for parallel computing environments. In this paper, we propose a new similarity-based technique for declustering data. The proposed method can adapt to the available information about query distribution (e.g. size, shape and frequency) and can ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Declustering problems are well-known in the databases for parallel computing environments. In this paper, we propose a new similarity-based technique for declustering data. The proposed method can adapt to the available information about query distribution (e.g. size, shape and frequency) and can work with alternative atomic data-types. Furthermore, the proposed method is flexible and can work with alternative data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of atomic data-items that are frequently accessed together by queries are allocated to distinct disks. We describe the application of the proposed method to parallelizing Grid Files at the data page level. Detailed experiments in this context show that the proposed method adapts to query distribution and data distribution, and tha...
DB2 Parallel Edition
, 1995
"... The rate of increase in database size and response time requirements has outpaced advancements in processor and mass storage technology. One way to satisfy the increasing demand for processing power and I/O bandwidth in database applications is to have a number of processors, loosely or tightly c ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
The rate of increase in database size and response time requirements has outpaced advancements in processor and mass storage technology. One way to satisfy the increasing demand for processing power and I/O bandwidth in database applications is to have a number of processors, loosely or tightly coupled, serving database requests concurrently. Technologies developed during the last decade have made commercial parallel database systems a reality and these systems have made an inroad into the stronghold of traditionally mainframe based large database applications. This paper describes the parallel database project initiated at IBM Research at Hawthorne and the DB2/AIX-PE product based on it. 1 Introduction Large scale parallel processing technology has made giant strides in the past decade and there is no doubt that it has established a place for itself. However, almost all of the applications harnessing this technology are scientific or engineering applications. The lack of com...
A Similarity Graph-Based Approach to Declustering Problems and Its Application towards Parallelizing Grid Files
- In the 11th Inter. Conference on Data Engineering
, 1995
"... We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing Grid Files show that the proposed method outperforms mappingfunction -based methods for interesting query distributions as well for non-uniform data distributions. 1 Introduction With an increasing performance gap between processors and I/O systems, parallelizing I/O operations by declustering data [8, 7, 21] is becom...
Multi-dimensional database allocation for parallel data warehouses
- Proc. 26th VLDB Conference
, 2000
"... Data allocation is a key performance factor for parallel database systems (PDBS). This holds especially for data warehousing environments where huge amounts of data and complex analytical queries have to be dealt with. While there are several studies on data allocation for relational PDBS, the speci ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Data allocation is a key performance factor for parallel database systems (PDBS). This holds especially for data warehousing environments where huge amounts of data and complex analytical queries have to be dealt with. While there are several studies on data allocation for relational PDBS, the specific requirements of data warehouses have not yet been sufficiently addressed. In this study, we consider the allocation of relational data warehouses based on a star schema and utilizing bitmap index structures. We investigate how a multi-dimensional hierarchical data fragmentation of the fact table supports queries referencing different subsets of the schema dimensions. Our analysis is based on realistic parameters derived from a decision support benchmark. The performance implications of different allocation choices are evaluated by means of a detailed simulation model. 1
Tradeoffs in Processing Multi-Way Join Queries via Hashing in Multiprocessor Database Machines
- In Proceedings of the 16th International Conference on Very Large Data Bases (VLDB
, 1990
"... During the past five years the design, implementation, and evaluation of join algorithms that exploit large main memories and parallel processors has received a great deal of attention. However, most of this work has addressed the problem of executing joins involving only two relations. In this pape ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
During the past five years the design, implementation, and evaluation of join algorithms that exploit large main memories and parallel processors has received a great deal of attention. However, most of this work has addressed the problem of executing joins involving only two relations. In this paper we examine the problem of processing multi-way join queries through hash-based join methods in a shared-nothing database environment. We first discuss how the choice of a format for a complex query can have a significant effect on the performance of a multiprocessor database machine. Experimental results obtained from a simulation study are then presented to demonstrate the tradeoffs of left-deep and right-deep scheduling strategies for complex join query evaluation. These results demonstrate that right-deep scheduling strategies can provide significant performance advantages in large multiprocessor database machines, even when memory is limited. 1. Introduction Several important trends...
dewitt @ cs.wisc.edu
, 1992
"... Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper ..."
Abstract
- Add to MetaCart
Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems. 1.

