Results 1 - 10
of
15
Fast Parallel GPU-Sorting Using a Hybrid Algorithm
"... Abstract — This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Abstract — This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in parallel using merge-sort. The parallel bucketsort, implemented in NVIDIA’s CUDA, utilizes the synchronization mechanisms, such as atomic increment, that is available on modern GPUs. The mergesort requires scattered writing, which is exposed by CUDA and ATI’s Data Parallel Virtual Machine[1]. For lists with more than 512k elements, the algorithm performs better than the bitonic sort algorithms, which have been considered to be the fastest for GPU sorting, and is more than twice as fast for 8M elements. It is 6-14 times faster than single CPU quicksort for 1-8M elements respectively. In addition, the new GPU-algorithm sorts on n log n time as opposed to the standard n(log n) 2 for bitonic sort. Recently, it was shown how to implement GPU-based radix-sort, of complexity n log n, to outperform bitonic sort. That algorithm is, however, still up to ∼ 40 % slower for 8M elements than the hybrid algorithm presented in this paper. GPU-sorting is memory bound and a key to the high performance is that the mergesort works on groups of four-float values to lower the number of memory fetches. Finally, we demonstrate the performance on sorting vertex distances for two large 3D-models; a key in for instance achieving correct transparency. I.
Faster Lightweight Suffix Array Construction
"... The suffix array is a data structure formed by sorting the suffixes of a string into lexicographical order. It is important for a variety of applications, perhaps most notably pattern matching, pattern discovery and block-sorting data compression. The last decade has seen intensive research toward e ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The suffix array is a data structure formed by sorting the suffixes of a string into lexicographical order. It is important for a variety of applications, perhaps most notably pattern matching, pattern discovery and block-sorting data compression. The last decade has seen intensive research toward efficient construction of suffix arrays with algorithms striving not only to be fast, but also “lightweight” (in the sense that they use small working memory). In this paper we describe a new lightweight suffix array construction algorithm. By exploiting several interesting properties of suffixes in combination with cache concious programming we acheive excellent runtimes. Extensive experiments show our approach to be faster that all other known algorithms for the task.
Fast Focus+Context Visualization of Large Scientific Data
, 2004
"... Visualization of high-dimensional and time-dependent data, resulting from computational simulation, is a very challenging and resource-consuming task. Here, featurebased visualization approaches, which aim at a usercontrolled reduction of the data shown at one instance of time, proof to be useful. ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Visualization of high-dimensional and time-dependent data, resulting from computational simulation, is a very challenging and resource-consuming task. Here, featurebased visualization approaches, which aim at a usercontrolled reduction of the data shown at one instance of time, proof to be useful.
An introspective algorithm for the integer determinant
- In: Proceedings of Transgressive Computing 2006
, 2006
"... ljk.imag.fr/membres/{Jean-Guillaume.Dumas;Anna.Urbanska} We present an algorithm for computing the determinant of an integer matrix A. The algorithm is introspective in the sense that it uses several distinct algorithms that run in a concurrent manner. During the course of the algorithm partial resu ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
ljk.imag.fr/membres/{Jean-Guillaume.Dumas;Anna.Urbanska} We present an algorithm for computing the determinant of an integer matrix A. The algorithm is introspective in the sense that it uses several distinct algorithms that run in a concurrent manner. During the course of the algorithm partial results coming from distinct methods can be combined. Then, depending on the current running time of each method, the algorithm can emphasize a particular variant. With the use of very fast modular routines for linear algebra, our implementation is an order of magnitude faster than other existing implementations. Moreover, we prove that the expected complexity of our algorithm is only O � n 3 log 2.5 (n�A�) � bit operations in the case of random dense matrices, where n is the dimension and �A � is the largest entry in the absolute value of the matrix. 1
GPU-Quicksort: A Practical Quicksort Algorithm for Graphics Processors
"... In this paper we describe GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors. Quicksort has previously been considered an inefficient sorting solution for graphics processors, but we show that in CUDA, NVIDIA’s programming platform for general ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we describe GPU-Quicksort, an efficient Quicksort algorithm suitable for highly parallel multi-core graphics processors. Quicksort has previously been considered an inefficient sorting solution for graphics processors, but we show that in CUDA, NVIDIA’s programming platform for general purpose computations on graphical processors, GPU-Quicksort performs better than the fastest known sorting implementations for graphics processors, such as radix and bitonic sort. Quicksort can thus be seen as a viable alternative for sorting large quantities of data on graphics processors.
A Portable Cache Profiler Based on Source-Level Instrumentation,” tech. rep
, 2003
"... 1.2 Algorithms Should Be Cache-Conscious......................... 2 ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
1.2 Algorithms Should Be Cache-Conscious......................... 2
Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores ⇤
"... Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: (a) they have little idle system time to devote on physical design; and (b) there is little, if any, a priori workload knowled ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: (a) they have little idle system time to devote on physical design; and (b) there is little, if any, a priori workload knowledge, while the query and data workload keeps changing dynamically. In such environments, traditional approaches to index building and maintenance cannot apply. Database cracking has been proposed as a solution that allows on-the-fly physical data reorganization, as a collateral effect of query processing. Cracking aims to continuously and automatically adapt indexes to the workload at hand, without human intervention. Indexes are built incrementally, adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing methods fail to deliver workload-robustness; they perform much better with random workloads than with others. This frailty derives from the inelasticity with which these approaches interpret each query as a hint on how data should be stored. Current cracking schemes blindly reorganize the data within each query’s range, even if that results into successive expensive operations with minimal indexing benefit. In this paper, we introduce stochastic cracking, a significantly more resilient approach to adaptive indexing. Stochastic cracking also uses each query as a hint on how to reorganize data, but not blindly so; it gains resilience and avoids performance bottlenecks by deliberately applying certain arbitrary choices in its decisionmaking. Thereby, we bring adaptive indexing forward to a mature formulation that confers the workload-robustness previous approaches lacked. Our extensive experimental study verifies that stochastic cracking maintains the desired properties of original database cracking while at the same time it performs well with diverse realistic workloads. 1.
XGILF – A Conceptual Frame for Compiling and Linking Generic Libraries
"... The classical approach to compiler construction impedes development of languages that support the generic programming paradigm, e.g. C++ and Ada95. We will document the problems and describe an conceptual implementation frame that better serves this purpose. It is built around an XML based interme ..."
Abstract
- Add to MetaCart
The classical approach to compiler construction impedes development of languages that support the generic programming paradigm, e.g. C++ and Ada95. We will document the problems and describe an conceptual implementation frame that better serves this purpose. It is built around an XML based intermediate representation of the code. The key point in our approach is to defer code generation to link- or runtime. Moreover, we will show how the characteristics of our frame enable new high level optimizations, like runtime algorithm selection for generic functions.

