Results 1 - 10
of
16
Staircase Join: Teach a Relational DBMS to Watch its (Axis) Steps
- IN PROC. OF THE 29TH INT’L CONFERENCE ON VERY LARGE DATABASES (VLDB
, 2003
"... Relational query processors derive much of their effectiveness from the awareness of specific table properties like sort order, size, or absence of duplicate tuples. This text applies (and adapts) this successful principle to database-supported XML and XPath processing: the relational system is made ..."
Abstract
-
Cited by 75 (23 self)
- Add to MetaCart
Relational query processors derive much of their effectiveness from the awareness of specific table properties like sort order, size, or absence of duplicate tuples. This text applies (and adapts) this successful principle to database-supported XML and XPath processing: the relational system is made tree aware, i.e., tree properties like subtree size, intersection of paths, inclusion or disjointness of subtrees are made explicit. We propose a local change to the database kernel, the staircase join, which encapsulates the necessary tree knowledge needed to improve XPath performance. Staircase join
GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management
, 2006
"... We present a new algorithm, GPUTeraSort, to sort billionrecord wide-key databases using a graphics processing unit (GPU) Our algorithm uses the data and task parallelism on the GPU to perform memory-intensive and computeintensive tasks while the CPU is used to perform I/O and resource management. We ..."
Abstract
-
Cited by 74 (9 self)
- Add to MetaCart
We present a new algorithm, GPUTeraSort, to sort billionrecord wide-key databases using a graphics processing unit (GPU) Our algorithm uses the data and task parallelism on the GPU to perform memory-intensive and computeintensive tasks while the CPU is used to perform I/O and resource management. We therefore exploit both the highbandwidth GPU memory interface and the lower-bandwidth CPU main memory interface and achieve higher memory bandwidth than purely CPU-based algorithms. GPUTera-Sort is a two-phase task pipeline: (1) read disk, build keys, sort using the GPU, generate runs, write disk, and (2) read, merge, write. It also pipelines disk transfers and achieves near-peak I/O performance. We have tested the performance of GPUTeraSort on billion-record files using the standard Sort benchmark. In practice, a 3 GHz Pentium IV PC with $265 NVIDIA 7800 GT GPU is significantly faster than optimized CPU-based algorithms on much faster processors, sorting 60GB for a penny; the best reported PennySort price-performance. These results suggest that a GPU co-processor can significantly improve performance on large data processing tasks. 1.
Fast Computation of Database Operations Using Graphics Processors
- Proc. of ACM SIGMOD
, 2004
"... We present new algorithms on commodity graphics processors for performing fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical databa ..."
Abstract
-
Cited by 61 (14 self)
- Add to MetaCart
We present new algorithms on commodity graphics processors for performing fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. While graphics processing units (GPUs) have been designed for fast display of geometric primitives, we utilize the inherent pipelining and parallelism, single instruction and multiple data (SIMD) capabilities, and vector processing functionality of GPUs, for evaluating boolean predicate combinations and semi-linear queries on attributes and executing database operations e#ciently. Our algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. Our algorithms have been implemented on a programmable GPU (e.g. NVIDIA's GeForce FX 5900) and applied to databases consisting of up to a million records. We have compared their performance with an optimized implementation of CPU-based algorithms. Our experiments indicate that the graphics processor available on commodity computer systems is an e#ective co-processor for performing database operations.
Super-Scalar RAM-CPU Cache Compression
- In Proceedings of the International Conference of Data Engineering (IEEE ICDE
, 2006
"... CWI is a founding member of ERCIM, the European Research Consortium for Informatics and Mathematics. CWI's research has a theme-oriented structure and is grouped into four clusters. Listed below are the names of the clusters and in parentheses their acronyms. ..."
Abstract
-
Cited by 49 (12 self)
- Add to MetaCart
CWI is a founding member of ERCIM, the European Research Consortium for Informatics and Mathematics. CWI's research has a theme-oriented structure and is grouped into four clusters. Listed below are the names of the clusters and in parentheses their acronyms.
Declarative information extraction using Datalog with embedded extraction predicates
- in VLDB
, 1997
"... In this paper we argue that developing information extraction (IE) programs using Datalog with embedded procedural extraction predicates is a good way to proceed. First, compared to current ad-hoc composition using, e.g., Perl or C++, Datalog provides a cleaner and more powerful way to compose small ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
In this paper we argue that developing information extraction (IE) programs using Datalog with embedded procedural extraction predicates is a good way to proceed. First, compared to current ad-hoc composition using, e.g., Perl or C++, Datalog provides a cleaner and more powerful way to compose small extraction modules into larger programs. Thus, writing IE programs this way retains and enhances the important advantages of current approaches: programs are easy to understand, debug, and modify. Second, once we write IE programs in this framework, we can apply query optimization techniques to them. This gives programs that, when run over a variety of data sets, are more efficient than any monolithic program because they are optimized based on the statistics of the data on which they are invoked. We show how optimizing such programs raises challenges specific to text data that cannot be accommodated in the current relational optimization framework, then provide initial solutions. Extensive experiments over real-world data demonstrate that optimization is indeed vital for IE programs and that we can effectively optimize IE programs written in this proposed framework. 1.
The pipelined set cover problem
, 2003
"... Abstract. A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Abstract. A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered and the elements covered at each stage are discarded. We show that several natural heuristics for this NP-hard problem, such as the greedy set-cover heuristic and a local-search heuristic, can be analyzed using a linear-programming framework. These heuristics lead to efficient algorithms for pipelined set cover that can be applied to order possibly correlated selections in conventional database systems as well as datastream processing systems. We use our linear-programming framework to show that the greedy and local-search algorithms are 4-approximations for pipelined set cover. We extend our analysis to minimize the lp-norm of the costs paid by the sets, where p ≥ 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from initial sets in the pipeline. Finally, we consider the online version of pipelined set cover and present a competitive algorithm with a logarithmic performance guarantee. Our analysis framework may be applicable to other problems in query optimization where it is important to account for correlations. 1
Implementing Database Operations Using SIMD Instructions
, 2002
"... Modern CPUs have instructions that allow basic operations to be performed on several data elements in parallel. These instructions are called SIMD instructions, since they apply a single instruction to multiple data elements. SIMD technology was initially built into commodity processors in order to ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Modern CPUs have instructions that allow basic operations to be performed on several data elements in parallel. These instructions are called SIMD instructions, since they apply a single instruction to multiple data elements. SIMD technology was initially built into commodity processors in order to accelerate the performance of multimedia applications. SIMD instructions provide new opportunities for database engine design and implementation. We study various kinds of operations in a database context, and show how the inner loop of the operations can be accelerated using SIMD instructions. The use of SIMD instructions has two immediate performance benefits: It allows a degree of parallelism, so that many operands can be processed at once. It also often leads to the elimination of conditional branch instructions, reducing branch mispredictions.
A cache-efficient sorting algorithm for database and data mining computations using graphics processors
, 2005
"... We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
We present a fast sorting algorithm using graphics processors (GPUs) that adapts well to database and data mining applications. Our algorithm uses texture mapping and blending functionalities of GPUs to implement an efficient bitonic sorting network. We take into account the communication bandwidth overhead to the video memory on the GPUs and reduce the memory bandwidth requirements. We also present strategies to exploit the tile-based computational model of GPUs. Our new algorithm has a memoryefficient data access pattern and we describe an efficient instruction dispatch mechanism to improve the overall sorting performance. We have used our sorting algorithm to accelerate join-based queries and stream mining algorithms. Our results indicate up to an order of magnitude improvement over prior CPU-based and GPU-based sorting algorithms. 1
Vectorized Data Processing on the Cell Broadband Engine
- In Proc. of the Third International Workshop on Data Management on New Hardware
, 2007
"... Engine for database processing. We start by outlining the main architectural features of Cell and use microbenchmarks to characterize the latency and throughput of its memory infrastructure. Then, we discuss the challenges of porting RDBMS software to Cell: (i) all computations need to SIMD-ized, (i ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Engine for database processing. We start by outlining the main architectural features of Cell and use microbenchmarks to characterize the latency and throughput of its memory infrastructure. Then, we discuss the challenges of porting RDBMS software to Cell: (i) all computations need to SIMD-ized, (ii) all performance-critical branches need to be eliminated, (iii) a very small and hard limit on program code size should be respected. While we argue that conventional database implementations, i.e. row-stores with Volcano-style tuple pipelining, are a hard fit to Cell, it turns out that the three challenges are quite easily met in databases that use column-wise processing. We managed to implement a proof-of-concept port of the vectorized query processing model of MonetDB/X100 on Cell by running the operator pipeline on the PowerPC, but having it execute the vectorized primitives (data parallel) on its SPE cores. A performance evaluation on TPC-H Q1 shows that vectorized query processing on Cell can beat conventional PowerPC and Itanium2 CPUs by a factor 20. 1.

