Results 1 - 10
of
41
Parallel database systems: the future of high performance database systems
- Communications of the ACM
, 1992
"... Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper ..."
Abstract
-
Cited by 466 (8 self)
- Add to MetaCart
Abstract: Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems. 1.
The Gamma database machine project
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the arc ..."
Abstract
-
Cited by 203 (27 self)
- Add to MetaCart
This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the architecture to be scaled to 100s of processors. First, all relations are horizontally partitioned across multiple disk drives enabling relations to be scanned in parallel. Second, novel parallel algorithms based on hashing are used to implement the complex relational operators such as join and aggregate functions. Third, dataflow scheduling techniques are used to coordinate multioperator queries. By using these techniques it is possible to control the execution of very complex queries with minimal coordination- a necessity for configurations involving a very large number of processors. In addition to describing the design of the Gamma software, a thorough performance evaluation of the iPSC/2 hypercube version of Gamma is also presented. In addition to measuring the effect of relation size and indices on the response time for selection, join, aggregation, and update queries, we also analyze the performance of Gamma relative to the number of processors employed when the sizes of the input relations are kept constant (speedup) and when the sizes of the input relations are increased proportionally to the number of processors (scaleup). The speedup results obtained for both selection and join queries are linear; thus, doubling the number of processors
Staggered Striping in Multimedia Information Systems
, 1994
"... Multimedia information systems have emerged as an essential component of many application domains ranging from library information systems to entertainment technology. However, most implementations of these systems cannot support the continuous display of multimedia objects and suffer from freque ..."
Abstract
-
Cited by 163 (31 self)
- Add to MetaCart
Multimedia information systems have emerged as an essential component of many application domains ranging from library information systems to entertainment technology. However, most implementations of these systems cannot support the continuous display of multimedia objects and suffer from frequent disruptions and delays termed hiccups. This is due to the low I/O bandwidth of the current disk technology, the high bandwidth requirement of multimedia objects, and the large size of these objects that almost always requires them to be disk resident. One approach to resolve this limitation is to decluster a multimedia object across multiple disk drives in order to employ the aggregate bandwidth of several disks to support the continuous retrieval (and display) of objects. This paper describes staggered striping as a novel technique to provide effective support for multiple users accessing the different objects in the database. Detailed simulations confirm the superiority of stagge...
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment
, 1989
"... The join operator has been a cornerstone of relational database systems since their inception. As such, much time and effort has gone into making joins efficient. With the obvious trend towards multiprocessors, attention has focused on efficiently parallelizing the join operation. In this paper we a ..."
Abstract
-
Cited by 147 (14 self)
- Add to MetaCart
The join operator has been a cornerstone of relational database systems since their inception. As such, much time and effort has gone into making joins efficient. With the obvious trend towards multiprocessors, attention has focused on efficiently parallelizing the join operation. In this paper we analyze and compare four parallel join algorithms. Grace and Hybrid hash represent the class of hash-based join methods, Simple hash represents a looping algorithm with hashing, and our last algorithm is the more traditional sort-merge. The Gamma database machine serves as the host for the performance comparison. Gamma’s shared-nothing architecture with commercially available components is becoming increasingly common, both in research and in industry. 1.
Chained Declustering: A New Availability Strategy for Multiprocssor Database
- IN PROCEEDINGS OF 6TH INTERNATIONAL DATA ENGINEERING CONFERENCE
, 1990
"... This paper presents a new strategy for increasing the availability of data in multi-processor, shared-nothing database machines. This technique, termed chained declustering, is demonstrated to provide superior performance in the event of failures while maintaining a very high degree of data availabi ..."
Abstract
-
Cited by 112 (6 self)
- Add to MetaCart
This paper presents a new strategy for increasing the availability of data in multi-processor, shared-nothing database machines. This technique, termed chained declustering, is demonstrated to provide superior performance in the event of failures while maintaining a very high degree of data availability. Furthermore, unlike most earlier replication strategies, the implementation of chained declustering requires no special hardware and only minimal modifications to existing software.
Multiprocessor hash-based join algorithms
, 1985
"... This paper extends earlier research on hash-join algorithms to a multiprocessor architecture. Implementations of a number of centralized join algorithms are described and measured. Evaluation of these algorithms served to verify earlier analytical results. In addition, they demonstrate that bit vect ..."
Abstract
-
Cited by 101 (10 self)
- Add to MetaCart
This paper extends earlier research on hash-join algorithms to a multiprocessor architecture. Implementations of a number of centralized join algorithms are described and measured. Evaluation of these algorithms served to verify earlier analytical results. In addition, they demonstrate that bit vector filtering provides dramatic improvement in the performance of all algorithms including the sort merge join algorithm. Multiprocessor configurations of the centralized Grace and Hybrid hash-join algorithms are also presented. Both algorithms are shown to provide linear increases in throughput with corresponding increases in processor and disk resources. 1.
Parallel sorting on a shared-nothing architecture using probabilistic splitting
, 1991
"... We consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by each processor. We consider two techniques for determining these ranges of sort keys: exact splitting, using a paralle ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
We consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by each processor. We consider two techniques for determining these ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles. We present analytic results showing that probabilistic splitting performs better than exact splitting. Finally, we present experimental results from an implementation of sorting via probabilistic splitting in the Gamma parallel database machine.
An Evaluation of Non-Equijoin Algorithms
- IN VLDB
, 1991
"... A non-equijoin of relations R and S is a band join if the join predicate requires values in the join attribute of R to fall within a speci ed band about the values in the join attribute of S. We propose a new algorithm, termed a partitioned band join, for evaluating band joins. We present a comparis ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
A non-equijoin of relations R and S is a band join if the join predicate requires values in the join attribute of R to fall within a speci ed band about the values in the join attribute of S. We propose a new algorithm, termed a partitioned band join, for evaluating band joins. We present a comparison between the partitioned band join algorithm and the classical sort-merge join algorithm (optimized for band joins) using both an analytical model and an implementation on top of the WiSS storage system. The results show that the partitioned band join algorithm outperforms sortmerge unless memory is scarce and the operands of the join are of equal size. We also describe a parallel implementation of the partitioned band join on the Gamma database machine, and present data from speedup and scaleup experiments demonstrating that the partitioned band join is efficiently parallelizable.
Continuous Retrieval of Multimedia Data Using Parallelism
, 1993
"... Multimedia information systems have emerged as an essential component of many application domains ranging from library information systems to entertainment technology. This is because these systems utilize a variety of human senses to provide an effective means of communicating information. However ..."
Abstract
-
Cited by 66 (12 self)
- Add to MetaCart
Multimedia information systems have emerged as an essential component of many application domains ranging from library information systems to entertainment technology. This is because these systems utilize a variety of human senses to provide an effective means of communicating information. However, most implementations of these systems (based on a workstation) cannot support a continuous display of high resolution audio and video data and suffer from fkequent disruptions and delays termed hiccups. This is due to the low U0 bandwidth of the current disk technology, the high bandwidth requirement of multimedia objects, and the large size of these objects which mquks them to be almost always disk resident. In this paper, we describe a parallel multimedia information system and the key technical ideas that enable it to support a real-time display of multimedii objects. These techniques are as follows. First, we decluster a multimedia object across several disk drives, enabling the system to utilize the aggregate bandwidth of multiple disks to retrieve an object in real-time. Second, the workload of an application is distributed evenly across the disk drives in order to maximize the processing capability of the system. To support simultaneous display of several multimedia objects for different users, we describe two altemative approaches. The first approach multitasks a disk drive among several requests while the second replicates the data and dedicates resources to each individual request. We investigate the trade-offs associated with each approach using a simulation model. Our results demonstrate the superiority of the replication approach.
Parallel Database Systems: The Future of Database Processing or a Passing Fad?
- SIGMOD RECORD
, 1991
"... Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews th ..."
Abstract
-
Cited by 46 (6 self)
- Add to MetaCart
Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews the techniques used by such systems, and surveys current commercial and research systems.

