Results 1 
5 of
5
GPU Accelerating SpeededUp Robust Features
"... Abstract — Many computer vision tasks require interest point detection and description, such as realtime visual navigation. We present a GPU implementation of the recently proposed SpeededUp Robust Feature extractor [1], currently the state of the art for this task. Robust feature descriptors can ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Many computer vision tasks require interest point detection and description, such as realtime visual navigation. We present a GPU implementation of the recently proposed SpeededUp Robust Feature extractor [1], currently the state of the art for this task. Robust feature descriptors can give vast improvements in the quality and speed of subsequent steps, but require intensive computation up front that is wellsuited to inexpensive graphics hardware. We describe the algorithm’s translation to the GPU in detail, with several novel optimizations, including a new method of computing multidimensional parallel prefix sums. It operates at over 30 Hz at HD resolutions with thousands of features and in excess of 70 Hz at SD resolutions. I.
Hierarchical bin buffering: Online local moments for dynamic external memory arrays
, 2004
"... Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array of size n and we want to compute the first N local moments, for some constant N. Without precomputation, this requires O(n) time. We develop a sequence of algorithms of increasing sophistication that use precomputation and additional buffer space to speed up queries. The simpler algorithms partition the I/O array into consecutive ranges called bins, and they are applicable not only to localmoment queries, but also to algebraic queries in general (MAX, AVERAGE, SUM, etc.). With N buffers of size √ n, time complexity drops to O ( √ n). A more sophisticated approach uses hierarchical buffering and has a logarithmic time complexity (O(blog b n)), when using N hierarchical buffers of size n/b. Using Overlapped Bin Buffering, we show that only a single buffer is needed, as with waveletbased algorithms, but using much less storage. Applications exist in multidimensional and statistical databases over massive data sets, interactive image processing, and visualization. 1
A family of computationefficient parallel prefix algorithms
 WSEAS Trans. Comput
, 2006
"... Abstract: We are interested in solving the prefix problem of n inputs using p < n processors on completely connected distributedmemory multicomputers (CCDMMs). This paper improves a previous work in three respects. First, the communication time of the previous algorithm is reduced significantly ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract: We are interested in solving the prefix problem of n inputs using p < n processors on completely connected distributedmemory multicomputers (CCDMMs). This paper improves a previous work in three respects. First, the communication time of the previous algorithm is reduced significantly. Second, we show that p(p + 1)/2 < n is required for the new algorithm and the original one to be applicable. Third, we argue that for the new algorithm to be faster than other algorithms run on CCDMMs, n> p3 is required. The new algorithm can achieve linear speedup and is costoptimal when n = Ω(p2 log p).
Proceedings of 3DPVT'08 the Fourth International Symposium on 3D Data Processing, Visualization and Transmission GPU Accelerating SpeededUp Robust Features
"... Abstract — Many computer vision tasks require interest point detection and description, such as realtime visual navigation. We present a GPU implementation of the recently proposed SpeededUp Robust Feature extractor [1], currently the state of the art for this task. Robust feature descriptors can ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — Many computer vision tasks require interest point detection and description, such as realtime visual navigation. We present a GPU implementation of the recently proposed SpeededUp Robust Feature extractor [1], currently the state of the art for this task. Robust feature descriptors can give vast improvements in the quality and speed of subsequent steps, but require intensive computation up front that is wellsuited to inexpensive graphics hardware. We describe the algorithm’s translation to the GPU in detail, with several novel optimizations, including a new method of computing multidimensional parallel prefix sums. It operates at over 30 Hz at HD resolutions with thousands of features and in excess of 70 Hz at SD resolutions. I.
Four Families of ComputationEfficient Parallel Prefix Algorithms for Multicomputers
, 2008
"... Four families of computationefficient parallel prefix algorithms for messagepassing multicomputers are presented. The first two families generalize previous algorithms that use only halfduplex communications, and thus can improve the running time. The third and fourth families adopt collective co ..."
Abstract
 Add to MetaCart
(Show Context)
Four families of computationefficient parallel prefix algorithms for messagepassing multicomputers are presented. The first two families generalize previous algorithms that use only halfduplex communications, and thus can improve the running time. The third and fourth families adopt collective communication operations to reduce the communication times of the first two, respectively. The precondition of all the presented algorithms is also derived. These families each provide the flexibility of choosing either less computation time or less communication time to achieve the minimal running time depending on the ratio of the time required by a communication step to the time required by a computation step. Keywords: Computationefficient; Cost optimality; Messagepassing multicomputers; Parallel algorithms; Prefix computation