Results 1 - 10
of
99
The OO7 benchmark
, 1993
"... The OO7 Benchmark represents a comprehensive test of OODBMS performance. In this report we describe the benchmark and present performance results from its implementation in four OODB systems. It is our hope that the OO7 Benchmark will provide useful insight for end-users evaluating the performance o ..."
Abstract
-
Cited by 258 (11 self)
- Add to MetaCart
The OO7 Benchmark represents a comprehensive test of OODBMS performance. In this report we describe the benchmark and present performance results from its implementation in four OODB systems. It is our hope that the OO7 Benchmark will provide useful insight for end-users evaluating the performance of OODB systems; we also hope that the research community will nd that OO7 provides a database schema, instance, and workload that is useful for evaluating new techniques and algorithms for OODBMS implementation.
The Gamma database machine project
- IEEE Transactions on Knowledge and Data Engineering
, 1990
"... This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the arc ..."
Abstract
-
Cited by 203 (27 self)
- Add to MetaCart
This paper describes the design of the Gamma database machine and the techniques employed in its implementation. Gamma is a relational database machine currently operating on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives. Gamma employs three key technical ideas which enable the architecture to be scaled to 100s of processors. First, all relations are horizontally partitioned across multiple disk drives enabling relations to be scanned in parallel. Second, novel parallel algorithms based on hashing are used to implement the complex relational operators such as join and aggregate functions. Third, dataflow scheduling techniques are used to coordinate multioperator queries. By using these techniques it is possible to control the execution of very complex queries with minimal coordination- a necessity for configurations involving a very large number of processors. In addition to describing the design of the Gamma software, a thorough performance evaluation of the iPSC/2 hypercube version of Gamma is also presented. In addition to measuring the effect of relation size and indices on the response time for selection, join, aggregation, and update queries, we also analyze the performance of Gamma relative to the number of processors employed when the sizes of the input relations are kept constant (speedup) and when the sizes of the input relations are increased proportionally to the number of processors (scaleup). The speedup results obtained for both selection and join queries are linear; thus, doubling the number of processors
Join Indices
- ACM Transactions on Database Systems
, 1987
"... In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of com ..."
Abstract
-
Cited by 188 (2 self)
- Add to MetaCart
In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of complex queries. For most of the joins, updates to join indices incur very little overhead. Some properties of a join index are (i) its efficient use of memory and adaptiveness to parallel execution, data type join predicates, (iv) its support for multirelation clustering, and (v) its use in representing directed graphs and in evaluating recursive queries. Finally, the analysis of the join algorithm using join indices shows its excellent performance.
Dataflow Query Execution in a Parallel Main-Memory Environment
- Distributed and Parallel Databases
, 1991
"... In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study, are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others, synch ..."
Abstract
-
Cited by 159 (4 self)
- Add to MetaCart
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study, are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others, synchronization issues are identified to limit the performance gain from parallelism. A new hashjoin algorithm, called Pipelining hash-join is introduced that has fewer synchronization constraints than the known hash-join algorithms. Also, the behavior of individual join operations in a join-tree is studied in a simulation experiment. The results show that the Pipelining hash-join algorithm yields a better performance for multi-join queries. Also, the format of the optimal join-tree appears to depend on the size of the operands of the join: A multi-join between small operands performs best with a bushy schedule; larger operands are better off with a linear schedule. The results from the simulatio...
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment
, 1989
"... The join operator has been a cornerstone of relational database systems since their inception. As such, much time and effort has gone into making joins efficient. With the obvious trend towards multiprocessors, attention has focused on efficiently parallelizing the join operation. In this paper we a ..."
Abstract
-
Cited by 147 (14 self)
- Add to MetaCart
The join operator has been a cornerstone of relational database systems since their inception. As such, much time and effort has gone into making joins efficient. With the obvious trend towards multiprocessors, attention has focused on efficiently parallelizing the join operation. In this paper we analyze and compare four parallel join algorithms. Grace and Hybrid hash represent the class of hash-based join methods, Simple hash represents a looping algorithm with hashing, and our last algorithm is the more traditional sort-merge. The Gamma database machine serves as the host for the performance comparison. Gamma’s shared-nothing architecture with commercially available components is becoming increasingly common, both in research and in industry. 1.
Practical Selectivity Estimation through Adaptive Sampling
, 1992
"... Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analys ..."
Abstract
-
Cited by 146 (6 self)
- Add to MetaCart
Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm; in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide "sanity bounds" to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.
An Interval Classifier for Database Mining Applications
, 1992
"... We are given a large population database that contains information about population instances. The population is known to comprise of m groups, but the population instances are not labeled with the group identification. Also given is a population sample (much smaller than the population but represen ..."
Abstract
-
Cited by 104 (7 self)
- Add to MetaCart
We are given a large population database that contains information about population instances. The population is known to comprise of m groups, but the population instances are not labeled with the group identification. Also given is a population sample (much smaller than the population but representative ofit) in which the group labels of the instances are known. We present aninterval classifier (IC) which generates a classification function for each group that can be used to efficiently retrieve all instances of the specied group from the population database. To allow IC to be embedded in interactive loops to answer adhoc queries about attributes with missing values, IC has been designed to be efficient in the generation of classification functions. Preliminary experimental results indicate that IC not only has retrieval and classifier generation efficiency advantages, but also compares favorably in the classification accuracy with current tree classifirs, such as ID3, which were primarily designed for minimizing classification errors. We also describe some new applications that arise from encapsulating the classification capability in database systems and discuss extensions to IC for it to be used in these new application domains.
XJoin: A Reactively-Scheduled Pipelined Join Operator
- IEEE DATA ENGINEERING BULLETIN
, 2000
"... Wide-area distribution raises significant performance problems for traditional query processing techniques as data access becomes less predictable due to link congestion, load imbalances, and temporary outages. Pipelined query execution is a promising approach to coping with unpredictability in such ..."
Abstract
-
Cited by 102 (8 self)
- Add to MetaCart
Wide-area distribution raises significant performance problems for traditional query processing techniques as data access becomes less predictable due to link congestion, load imbalances, and temporary outages. Pipelined query execution is a promising approach to coping with unpredictability in such environments as it allows scheduling to adjust to the arrival properties of the data. We have developed a non-blocking join operator, called XJoin, which has a small memory footprint, allowing many such operators to be active in parallel. XJoin is optimized to produce initial results quickly and can hide intermittent delays in data arrival by reactively scheduling background processing. We show that XJoin is an effective solution for providing fast query responses to users even in the presence of slow and bursty remote sources.
Multiprocessor hash-based join algorithms
, 1985
"... This paper extends earlier research on hash-join algorithms to a multiprocessor architecture. Implementations of a number of centralized join algorithms are described and measured. Evaluation of these algorithms served to verify earlier analytical results. In addition, they demonstrate that bit vect ..."
Abstract
-
Cited by 101 (10 self)
- Add to MetaCart
This paper extends earlier research on hash-join algorithms to a multiprocessor architecture. Implementations of a number of centralized join algorithms are described and measured. Evaluation of these algorithms served to verify earlier analytical results. In addition, they demonstrate that bit vector filtering provides dramatic improvement in the performance of all algorithms including the sort merge join algorithm. Multiprocessor configurations of the centralized Grace and Hybrid hash-join algorithms are also presented. Both algorithms are shown to provide linear increases in throughput with corresponding increases in processor and disk resources. 1.
On the Generation of Spatiotemporal Datasets
, 1999
"... . An efficient benchmarking environment for spatiotemporal access methods should at least include modules for generating synthetic datasets, storing datasets (real datasets included), collecting and running access structures, and visualizing experimental results. Focusing on the dataset reposito ..."
Abstract
-
Cited by 93 (11 self)
- Add to MetaCart
. An efficient benchmarking environment for spatiotemporal access methods should at least include modules for generating synthetic datasets, storing datasets (real datasets included), collecting and running access structures, and visualizing experimental results. Focusing on the dataset repository module, a collection of synthetic data that would simulate a variety of real life scenarios is required. Several algorithms have been implemented in the past to generate static spatial (point or rectangular) data, for instance, following a predefined distribution in the workspace. However, by introducing motion, and thus temporal evolution in spatial object definition, generating synthetic data tends to be a complex problem. In this paper, we discuss the parameters to be considered by a generator for such type of data, propose an algorithm, called "Generate_Spatio_Temporal_Data" (GSTD), which generates sets of moving point or rectangular data that follow an extended set of distri...

