Results 1  10
of
18
Fast Parallel GPUSorting Using a Hybrid Algorithm
"... Abstract — This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Abstract — This paper presents an algorithm for fast sorting of large lists using modern GPUs. The method achieves high speed by efficiently utilizing the parallelism of the GPU throughout the whole algorithm. Initially, a parallel bucketsort splits the list into enough sublists then to be sorted in parallel using mergesort. The parallel bucketsort, implemented in NVIDIA’s CUDA, utilizes the synchronization mechanisms, such as atomic increment, that is available on modern GPUs. The mergesort requires scattered writing, which is exposed by CUDA and ATI’s Data Parallel Virtual Machine[1]. For lists with more than 512k elements, the algorithm performs better than the bitonic sort algorithms, which have been considered to be the fastest for GPU sorting, and is more than twice as fast for 8M elements. It is 614 times faster than single CPU quicksort for 18M elements respectively. In addition, the new GPUalgorithm sorts on n log n time as opposed to the standard n(log n) 2 for bitonic sort. Recently, it was shown how to implement GPUbased radixsort, of complexity n log n, to outperform bitonic sort. That algorithm is, however, still up to ∼ 40 % slower for 8M elements than the hybrid algorithm presented in this paper. GPUsorting is memory bound and a key to the high performance is that the mergesort works on groups of fourfloat values to lower the number of memory fetches. Finally, we demonstrate the performance on sorting vertex distances for two large 3Dmodels; a key in for instance achieving correct transparency. I.
Large deviations for quicksort
 Journal of Algorithms
, 1996
"... Let Qn be the random number of comparisons made by quicksort in sorting n distinct keys, when we assume that all n! possible orderings are equally likely. Known results concerning moments for Qn do not show how rare it is for Qn to make large deviations from its mean. Here we give a good approximati ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Let Qn be the random number of comparisons made by quicksort in sorting n distinct keys, when we assume that all n! possible orderings are equally likely. Known results concerning moments for Qn do not show how rare it is for Qn to make large deviations from its mean. Here we give a good approximation to the probability of such a large deviation, and find that this probability is quite small. As well as the basic quicksort we consider the variant in which the partitioning key is chosen as the median of (2t + 1) keys. c ○ 1996 Academic Press, Inc. 1
The Orc Programming Language
, 2009
"... Orc was originally presented as a process calculus. It has now evolved into a full programming language, which we describe in this paper. The language has the structure and feel of a functional programming language, yet it handles many nonfunctional aspects effectively, including spawning of conc ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Orc was originally presented as a process calculus. It has now evolved into a full programming language, which we describe in this paper. The language has the structure and feel of a functional programming language, yet it handles many nonfunctional aspects effectively, including spawning of concurrent threads, timeouts and mutable state. We first describe the original concurrency combinators of the process calculus. Next we describe a small functional programming language that forms the core language. Then we show how the concurrency combinators of the process calculus and the functional core language are integrated seamlessly. The resulting language and its supporting libraries have proven very effective in describing typical concurrent computations; we demonstrate how several practical concurrent programming problems are easily solved in Orc.
Quicksort: Combining Concurrency, Recursion, and Mutable Data Structures
"... Quicksort [5] remains one of the most studied algorithms in computer science. It is important not only as a practical sorting method, but also as a splendid teaching aid for introducing recursion and systematic algorithm development. The algorithm has been studied extensively; so, it is natural to a ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Quicksort [5] remains one of the most studied algorithms in computer science. It is important not only as a practical sorting method, but also as a splendid teaching aid for introducing recursion and systematic algorithm development. The algorithm has been studied extensively; so, it is natural to assume that everything that needs to be said about it has already been said. Yet, in attempting to code it using a recent programming language of our design, we discovered that its structure is more clearly expressed as a concurrent program that manipulates a shared mutable store, without any locking or explicit synchronization. In this paper, we describe the essential aspects of our programming language Orc [8], show a number of examples that combine its features in various forms, and then develop a concise description of Quicksort. We hope to highlight the importance of including concurrency, recursion and mutability within a single theory.
Adapt: Global Image Processing with the Split and Merge Model
, 1991
"... Adapt is a simple architectureindependent language for both local and global image processing operations based on the split and merge programming model. In the split and merge model, the image is divided into portions, each portion is processed independently, and then the separate results are combi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Adapt is a simple architectureindependent language for both local and global image processing operations based on the split and merge programming model. In the split and merge model, the image is divided into portions, each portion is processed independently, and then the separate results are combined. The split and merge model is examined, and is found capable of computing any image processing operation that can be computed in forward or reverse order over a data strcture. Moreover, the split and merge model is amenable to efficient implementation on a wide variety of parallel architectures. The Adapt language is described, and Adapt programming techniques are presented. Several algorithms for important image processing operations, including image warping, image connected components, and twodimensional fast Fourier transform, are presented in detail. Implementations of Adapt on several parallel computers are described. The performance of several algorithms on the Sun, the Carnegie M...
2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems Comparison and Analysis of Listening Test Methods for Development of Perceptual Speech Quality Assessment
"... Reliability of the listening test design has a great influence on the performance of the quality estimation model. In this paper we compare four different listening test designs by Monte Carlo simulation. Three common problems of interval scale ratings are included in the simulation, and their influ ..."
Abstract
 Add to MetaCart
(Show Context)
Reliability of the listening test design has a great influence on the performance of the quality estimation model. In this paper we compare four different listening test designs by Monte Carlo simulation. Three common problems of interval scale ratings are included in the simulation, and their influences on the performance of estimating the underlying true quality are investigated. It turns out that in these methods, randomly choosing partial trials for Scaled Comparison could be the most reliable way to perform listening test under the influences of interval scale ratings problems. 1.
A Publication of The Science and Information Organization From the Desk of Managing
"... With monthly feature peerreviewed articles and technical contributions, the Journal's content is dynamic, innovative, thoughtprovoking and directly beneficial to the readers in their work. The number of submissions have increased dramatically over the last issues. Our ability to accommodate th ..."
Abstract
 Add to MetaCart
(Show Context)
With monthly feature peerreviewed articles and technical contributions, the Journal's content is dynamic, innovative, thoughtprovoking and directly beneficial to the readers in their work. The number of submissions have increased dramatically over the last issues. Our ability to accommodate this growth is due in large part to the terrific work of our Editorial Board. Some of the papers have an introductory character, some of them access highly desired extensions for a particular method, and some of them even introduce completely new approaches to computer science research in a very efficient manner. This diversity was strongly desired and should contribute to evoke a picture of this field at large. As a consequence only 29 % of the received articles have been finally accepted for publication. With respect to all the contributions, we are happy to have assembled researchers whose names are linked to the particular manuscript they are discussing. Therefore, this issue may not just be used by the reader to get an introduction to the methods but also to the people behind that have been pivotal in the promotion of the respective research. By having in mind such future issues, we hope to establish a regular outlet for contributions and new findings in the field of Computer science and applications. Therefore, IJACSA in general, could serve as a reliable resource for everybody loosely or tightly attached to this field of science. And if only a single young researcher is inspired by this issue to contribute in the future to solve some of the problems sketched
Algorithmic Attacks and Timing Leaks in Distributed Systems
, 2005
"... An important class of remotely applicable security attacks concerns time. You can attack somebody by making their algorithms run in their worstcase behavior rather than commoncase behavior. Likewise, the processing time can disclose a secret. If an attacker can observe the time it takes for someb ..."
Abstract
 Add to MetaCart
(Show Context)
An important class of remotely applicable security attacks concerns time. You can attack somebody by making their algorithms run in their worstcase behavior rather than commoncase behavior. Likewise, the processing time can disclose a secret. If an attacker can observe the time it takes for somebody to process a request, an attacker may learn something about the internal state. The first part of this thesis defines a new class of attacks that perform a remote denial of service by deliberately choosing inputs to make common algorithms slow. These attacks are widespread. We show that vulnerable hash tables are used by Perl and Squid and we illustrate an attack on the Bro IDS. This second part of this thesis analyzes the opportunities for determining a remote party’s secret by analyzing processing time remotely over the Internet. Our measurements show that an attacker can potentially time a remote host to 300 nanoseconds over a local area network and less than 20 microseconds over the Internet. Contents Abstract ii
Editors Implementing
"... This paper is a practical study of how to implement the Quicksort sorting algorithm and its best variants on real computers, including how to apply various code optimization techniques. A detailed implementation combining the most effective improvements to Quicksort is given, along with a discussion ..."
Abstract
 Add to MetaCart
(Show Context)
This paper is a practical study of how to implement the Quicksort sorting algorithm and its best variants on real computers, including how to apply various code optimization techniques. A detailed implementation combining the most effective improvements to Quicksort is given, along with a discussion of how to implement it in assembly language. Analytic results describing the performance of the programs are summarized. A variety of special situations are considered from a practical standpoint to illustrate Quicksort's wide applicability as an internal sorting method which requires negligible extra storage.
The power of a twosided depth test and its application to CSG rendering and depth extraction
"... Shadow mapping is a technique for doing realtime shadowing. Recent work has shown that shadow mapping hardware can be used as a second depth test in addition to the ztest. In this paper, we explore the computational power provided by this second depth test, by demonstrating its utility in two sepa ..."
Abstract
 Add to MetaCart
Shadow mapping is a technique for doing realtime shadowing. Recent work has shown that shadow mapping hardware can be used as a second depth test in addition to the ztest. In this paper, we explore the computational power provided by this second depth test, by demonstrating its utility in two separate applications. We first