Results 1 - 10
of
10
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Data Compression
- ACM Computing Surveys
, 1987
"... This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effectiv ..."
Abstract
-
Cited by 81 (3 self)
- Add to MetaCart
This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory, as they relate to the goals and evaluation of data compression methods, are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported and possibilities for future research are suggested. INTRODUCTION Data compression is often referred to as coding, where coding is a very general term encompassing any special representation of data which satisfies a given need. Information theory is defined to be the study of eff...
Adding Compression to a Full-Text Retrieval System
, 1995
"... We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text... ..."
Abstract
-
Cited by 75 (25 self)
- Add to MetaCart
We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text...
Data Compression Using Dynamic Markov Modelling
- The Computer Journal
, 1986
"... A method to dynamically construct Markov models that describe the characteristics of binary messages is developed. Such models can be used to predict future message characters and can therefore be used as a basis for data compression. To this end, the Markov modelling technique is combined with Guaz ..."
Abstract
-
Cited by 69 (3 self)
- Add to MetaCart
A method to dynamically construct Markov models that describe the characteristics of binary messages is developed. Such models can be used to predict future message characters and can therefore be used as a basis for data compression. To this end, the Markov modelling technique is combined with Guazzo coding to produce a powerful method of data compression. The method has the advantage of being adaptive: messages may be encoded or decoded with just a single pass through the data. Experimental results reported here indicate that the Markov modelling approach generally achieves much better data compression than that observed with competing methods on typical computer data. Categories and Subject Descriptors: E.4 [Coding and Information Theory]: data compaction and compression; C.2.0 [Computer-Communication Networks]: data communications General Terms: Experimentation, Algorithms Additional Key Words and Phrases: Data compression, text compression, adaptive coding, Guazzo coding January...
The Implementation and Performance of Compressed Databases
, 1998
"... In this paper, we show how compression can be integrated into a relational database system. Specifically, we describe how the storage manager, the query execution engine, and the query optimizer of a database system can be extended to deal with compressed data. Our main result is that compression ca ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
In this paper, we show how compression can be integrated into a relational database system. Specifically, we describe how the storage manager, the query execution engine, and the query optimizer of a database system can be extended to deal with compressed data. Our main result is that compression can significantly improve the response time of queries if very light-weight compression techniques are used. We will present such light-weight compression techniques and give the results of running the TPC-D benchmark on a so compressed database and a non-compressed database using the AODB database system, an experimental database system that was developed at the Universities of Mannheim and Passau. Our benchmark results demonstrate that compression indeed offers high performance gains (up to 55%) for IO-intensive queries and moderate gains for CPU-intensive queries. Compression can, however, also increase the running time of certain update operations. In all, we recommend to extend today's da...
Data Compression and Database Performance
- In Proc. ACM/IEEE-CS Symp. On Applied Computing
, 1991
"... Data compression is widely used in data management to save storage space and network bandwidth. In this report, we outline the performance improvements that can be achieved by exploiting data compression in query processing. The novel idea is to leave data in compressed state as long as possible, an ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Data compression is widely used in data management to save storage space and network bandwidth. In this report, we outline the performance improvements that can be achieved by exploiting data compression in query processing. The novel idea is to leave data in compressed state as long as possible, and to only uncompress data when absolutely necessary. We will show that many query processing algorithms can manipulate compressed data just as well as decompressed data, and that processing compressed data can speed query processing by a factor much larger than the compression factor.
Information Retrieval Systems for Large Document Collections
- In Proceedings of the Third Text Retrieval Conference (TREC-3
"... Practical information retrieval systems must manage large volumes of data, often divided into several collections that may be held on separate machines. Techniques for locating matches to queries must therefore consider identification of probable collections as well as identification of documents th ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Practical information retrieval systems must manage large volumes of data, often divided into several collections that may be held on separate machines. Techniques for locating matches to queries must therefore consider identification of probable collections as well as identification of documents that are probable answers. Furthermore, the large amounts of data involved motivates the use of compression, but in a dynamic environment compression is problematic, because as new text is added the compression model slowly becomes inappropriate. In this paper we describe solutions to both of these problems. We show that use of centralised blocked indexes can reduce overall query processing costs in a multi-collection environment, and that careful application of text compression techniques allow collections to grow by several orders of magnitude without recompression becoming necessary. 1 Introduction Practical information systems are required to store many gigabytes of data while supporting ...
Query execution in column-oriented database systems
, 2008
"... There are two obvious ways to map a two-dimension relational database table onto a one-dimensional storage interface: store the table row-by-row, or store the table column-by-column. Historically, database system implementations and research have focused on the row-by row data layout, since it perfo ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
There are two obvious ways to map a two-dimension relational database table onto a one-dimensional storage interface: store the table row-by-row, or store the table column-by-column. Historically, database system implementations and research have focused on the row-by row data layout, since it performs best on the most common application for database systems: business transactional data processing. However, there are a set of emerging applications for database systems for which the row-by-row layout performs poorly. These applications are more analytical in nature, whose goal is to read through the data to gain new insight and use it to drive decision making and planning. In this dissertation, we study the problem of poor performance of row-by-row data layout for these emerging
Absolute Bounds on Set Intersection and Union Sizes From Distribution Information
, 1988
"... d s Estimation of set intersection and union sizes is important for access metho election for a database and other data retrieval problems. Absolute bounds on sizes a are often easier to compute than estimates, requiring no distributional or independence ssumptions, and can answer many of the same n ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
d s Estimation of set intersection and union sizes is important for access metho election for a database and other data retrieval problems. Absolute bounds on sizes a are often easier to compute than estimates, requiring no distributional or independence ssumptions, and can answer many of the same needs. We present a catalog of quick , a closed-form bounds on set intersection and union sizes; they can be expressed as rules nd managed by a rule-based system architecture. These methods use a variety of d statistics precomputed on the data, and exploit homomorphisms (onto mappings) of the ata items onto distributions that can be more easily analyzed. The methods can be t used anytime, but tend to work best when there are strong or complex correlations in he data. This circumstance is poorly handled by the standard independence1 assumption and distributional-assumption estimates, and hence our methods fill a need. . Why bounds? Good estimation of the sizes of set intersections and union...
Context-Sensitive Mobile Database Summarisation
, 2003
"... In mobile computing environments, as a result of the reduced capacity of local storage, it is commonly not feasible to replicate entire datasets on each mobile unit. In addition, reliable, secure and economical access to central servers is not always possible. Moreover, since mobile computers are de ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In mobile computing environments, as a result of the reduced capacity of local storage, it is commonly not feasible to replicate entire datasets on each mobile unit. In addition, reliable, secure and economical access to central servers is not always possible. Moreover, since mobile computers are designed to be portable, they are also physically small and thus often unable to hold or process the large amounts of data held in centralised databases. As many systems are only as useful as the data they can process, the support provided by database and system management middleware for applications in mobile environments is an important driver for the uptake of this technology by application providers and thus also for the wider use of the technology.

