Results 1 - 10
of
41
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 182 (2 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment
, 1989
"... The join operator has been a cornerstone of relational database systems since their inception. As such, much time and effort has gone into making joins efficient. With the obvious trend towards multiprocessors, attention has focused on efficiently parallelizing the join operation. In this paper we a ..."
Abstract
-
Cited by 147 (14 self)
- Add to MetaCart
The join operator has been a cornerstone of relational database systems since their inception. As such, much time and effort has gone into making joins efficient. With the obvious trend towards multiprocessors, attention has focused on efficiently parallelizing the join operation. In this paper we analyze and compare four parallel join algorithms. Grace and Hybrid hash represent the class of hash-based join methods, Simple hash represents a looping algorithm with hashing, and our last algorithm is the more traditional sort-merge. The Gamma database machine serves as the host for the performance comparison. Gamma’s shared-nothing architecture with commercially available components is becoming increasingly common, both in research and in industry. 1.
Benchmarking Database Systems - A Systematic Approach
- Proceedings of the 1983 Very Large Database Conference
, 1983
"... This paper describes a customized database and a comprehensive set of queries that can be used for systematic benchmarking of relational database systems. Designing this database and a set of carefully tuned benchmarks represents a first attempt in developing a scientific methodology for performance ..."
Abstract
-
Cited by 128 (13 self)
- Add to MetaCart
This paper describes a customized database and a comprehensive set of queries that can be used for systematic benchmarking of relational database systems. Designing this database and a set of carefully tuned benchmarks represents a first attempt in developing a scientific methodology for performance evaluation of database management systems. We have used this database to perform a comparative evaluation of the database machine DIRECT, the "university " and "commercial " versions of the INGRES database system, the relational database system ORACLE, and the IDM 500 database machine. We present a subset of our measurements (for the single user case only), that constitute a preliminary performance evaluation of these systems. NOTE TO THE READER It is important for the reader to recognize that the results presented in this paper represent the performance of the various database systems at ONE point in time and that new releases of the various systems will undoubtably perform differently. The objective of this research was not to make a definitive statement as to which is the best relational database system on the market today. Rather, our goal was to develop a standard set of benchmarks that could be used by database system designers for evaluating changes to their systems and by users for selecting the system which best suits their needs. It is also imperative that the reader understands that the results presented in no way measure the performance of the various systems in a multiuser environment. We are currently developing a methodology for benchmarking database systems in this environment.
A Case for Intelligent Disks (IDISKs)
, 1998
"... Decision support systems (DSS) and data warehousing workloads comprise an increasing fraction of the database market today. I/O capacity and associated processing requirements for DSS workloads are increasing at a rapid rate, doubling roughly every nine to twelve months [38]. In response to this inc ..."
Abstract
-
Cited by 106 (4 self)
- Add to MetaCart
Decision support systems (DSS) and data warehousing workloads comprise an increasing fraction of the database market today. I/O capacity and associated processing requirements for DSS workloads are increasing at a rapid rate, doubling roughly every nine to twelve months [38]. In response to this increasing storage and computational demand, we present a computer architecture for decision support database servers that utilizes "intelligent" disks (IDISKs). IDISKs utilize low-cost embedded general-purpose processing, main memory, and high-speed serial communication links on each disk. IDISKs are connected to each other via these serial links and high-speed crossbar switches, overcoming the I/O bus bottleneck of conventional systems. By off-loading computation from expensive desktop processors, IDISK systems may improve cost-performance. More importantly, the IDISK architecture allows the processing of the system to scale with increasing storage demand.
Multiprocessor hash-based join algorithms
, 1985
"... This paper extends earlier research on hash-join algorithms to a multiprocessor architecture. Implementations of a number of centralized join algorithms are described and measured. Evaluation of these algorithms served to verify earlier analytical results. In addition, they demonstrate that bit vect ..."
Abstract
-
Cited by 101 (10 self)
- Add to MetaCart
This paper extends earlier research on hash-join algorithms to a multiprocessor architecture. Implementations of a number of centralized join algorithms are described and measured. Evaluation of these algorithms served to verify earlier analytical results. In addition, they demonstrate that bit vector filtering provides dramatic improvement in the performance of all algorithms including the sort merge join algorithm. Multiprocessor configurations of the centralized Grace and Hybrid hash-join algorithms are also presented. Both algorithms are shown to provide linear increases in throughput with corresponding increases in processor and disk resources. 1.
Query Processing in a System for Distributed Databases (SDD-1
- ACM Transactions on Database Systems
, 1981
"... Thii paper describes the techniques used to optimize relational queries in the SDD-1 distributed database system. Queries are submitted to SDD-1 in a high-level procedural language called Datalan-guage. Optimization begins by translating each Datalanguage query into a relational calculus form called ..."
Abstract
-
Cited by 63 (0 self)
- Add to MetaCart
Thii paper describes the techniques used to optimize relational queries in the SDD-1 distributed database system. Queries are submitted to SDD-1 in a high-level procedural language called Datalan-guage. Optimization begins by translating each Datalanguage query into a relational calculus form called an envelope, which is essentially an aggregate-free QUEL query. This paper is primarily concerned with the optimization of envelopes. Envelopes are processed in two phases. The first phase executes relational operations at various sites of the distributed database in order to delimit a subset of the database that contains all data relevant to the envelope. This subset is called a reduction of the database. The second phase transmits the reduction to one designated site, and the query is executed locally at that site. The critical optimization problem is to perform the reduction phase efficiently. Success depends on designing a good repertoire of operators to use during this phase, and an effective algorithm for deciding which of these operators to use in processing a given envelope against a given database. The principal reduction operator that we employ is called a
Distributed Active Catalogs and Meta-Data Caching in Descriptive Name Services
- In IEEE International Conference on Distributed Computing Systems
, 1993
"... Today's global internetworks challenge the ability of name services and other information services to locate data quickly. We introduce a distributed active catalog and meta-data caching for optimizing queries in this environment. Our active catalog constrains the search space for a query by returni ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
Today's global internetworks challenge the ability of name services and other information services to locate data quickly. We introduce a distributed active catalog and meta-data caching for optimizing queries in this environment. Our active catalog constrains the search space for a query by returning a list of data repositories where the answer to the query is likely to be found. Meta-data caching improves performance by keeping frequently used characterizations of the search space close to the user, and eliminating active catalog communication and processing costs. When searching for query responses, our techniques contact only the small percentage of the data repositories with actual responses, resulting in search times of a few seconds. We implemented a distributed active catalog and meta-data caching in a prototype descriptive name service called "Nomenclator. " We present performance results for Nomenclator in a search space of 1000 data repositories. 1. Introduction Users canno...
A performance analysis of the Gamma database machine
- Proceedings of the ACM-SIGMOD International Conference on Management of Data
, 1988
"... This paper presents the results of an initial performance evaluation of the Gamma database machine based on an expanded version of the single-user Wisconsin benchmark. In our experiments we measured the effect of relation size and indices on response time for selection, join, and aggregation queries ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
This paper presents the results of an initial performance evaluation of the Gamma database machine based on an expanded version of the single-user Wisconsin benchmark. In our experiments we measured the effect of relation size and indices on response time for selection, join, and aggregation queries, and single-tuple updates. A Teradata DBC/1012 database machine of similar size is used as a basis for interpreting the results obtained. We also analyze the performance of Gamma relative to the number of processors employeed and study the impact of varying the memory size and disk page size on the execution time of a variety of selection and join queries. We analyze and interpret the results of these experiments based on our understanding of the system hardware and software, and conclude with an assessment of the strengths and weaknesses of the two machines. 1.
Computer Architecture Support for Database Applications
, 1999
"... Computer Architecture Support for Database Applications by Kimberly Kristine Keeton Doctor of Philosophy in Computer Science University of California at Berkeley Professor David A. Patterson, Chair Database workloads are an important class of applications, responsible for one-third of the symm ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Computer Architecture Support for Database Applications by Kimberly Kristine Keeton Doctor of Philosophy in Computer Science University of California at Berkeley Professor David A. Patterson, Chair Database workloads are an important class of applications, responsible for one-third of the symmetric multiprocessor (SMP) server market. Despite their importance, they are seldom used in computer architecture performance evaluations, which favor technical applications, such as SPEC. Database applications are often avoided because they are difficult to study in fully-scaled configurations, for reasons including large hardware requirements and complicated software configuration and tuning issues. This dissertation addresses several of the challenges posed by database workloads. First, we characterize the architectural behavior of two standard database workloads, namely online transaction processing (OLTP) and decision support (DSS), running on a commercial database on a commodity Intel...

