Results 1 - 10
of
11
K.Srinathan, "A performance prediction model for the CUDA GPGPU platform
- the 16th IEEE International Conference on High Performance Computing (HiPC
, 2009
"... The significant growth in computational power of mod-ern Graphics Processing Units(GPUs) coupled with the advent of general purpose programming environments like NVIDA’s CUDA, has seen GPUs emerging as a very popular parallel computing platform. However, de-spite their popularity, there is no perfor ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
The significant growth in computational power of mod-ern Graphics Processing Units(GPUs) coupled with the advent of general purpose programming environments like NVIDA’s CUDA, has seen GPUs emerging as a very popular parallel computing platform. However, de-spite their popularity, there is no performance model of any GPGPU programming environment. The absence of such a model makes it difficult to definitively as-sess the suitability of the GPU for solving a particular problem and is a significant impediment to the main-stream adoption of GPUs as a massively parallel (su-per)computing platform. In this paper we present a performance prediction model for the CUDA GPGPU platform. This model encompasses the various facets of the GPU architec-ture like scheduling, memory hierarchy and pipelin-ing among others. We also perform experiments that demonstrate the effects of various memory access strategies. The proposed model can be used to analyze pseudo code for a CUDA kernel to obtain a performance estimate, in a way that is similar to performing asymp-totic analysis. We illustrate the usage of our model and its accuracy, with three case studies: Matrix Multiplica-tion, List Ranking, and histogram generation. 1
Scalable learning for object detection with gpu hardware,”
- in Intelligent Robots and Systems,
, 2009
"... Abstract-We consider the problem of robotic object detection of such objects as mugs, cups, and staplers in indoor environments. While object detection has made significant progress in recent years, many current approaches involve extremely complex algorithms, and are prohibitively slow when applie ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
Abstract-We consider the problem of robotic object detection of such objects as mugs, cups, and staplers in indoor environments. While object detection has made significant progress in recent years, many current approaches involve extremely complex algorithms, and are prohibitively slow when applied to large scale robotic settings. In this paper, we describe an object detection system that is designed to scale gracefully to large data sets and leverages upward trends in computational power (as exemplified by Graphics Processing Unit (GPU) technology) and memory. We show that our GPU-based detector is up to 90 times faster than a well-optimized software version and can be easily trained on millions of examples. Using inexpensive off-the-shelf hardware, it can recognize multiple object types reliably in just a few seconds per frame.
EFFECTIVE AND ACCELERATED INFORMATIVE FRAME FILTERING IN
, 2010
"... Colonoscopy is an endoscopic technique that allows a physician to inspect the mucosa of the human colon. Previous methods and software solutions to detect informative frames in a colonoscopy video (a process called informative frame filtering or IFF) have been hugely ineffective in (1) covering the ..."
Abstract
- Add to MetaCart
Colonoscopy is an endoscopic technique that allows a physician to inspect the mucosa of the human colon. Previous methods and software solutions to detect informative frames in a colonoscopy video (a process called informative frame filtering or IFF) have been hugely ineffective in (1) covering the proper definition of an informative frame in the broadest sense and (2) striking an optimal balance between accuracy and speed of classification in both real-time and non real-time medical procedures. In my thesis, I propose a more effective method and faster software solutions for IFF which is more effective due to the introduction of a heuristic algorithm (derived from experimental analysis of typical colon features) for classification. It contributed to a 5-10% boost in various performance metrics for IFF. The software modules are faster due to the incorporation of sophisticated parallel-processing oriented coding techniques on modern microprocessors. Two IFF modules were created, one for post-procedure and the other for real-time. Code optimizations through NVIDIA CUDA for GPU processing and/or CPU multi-threading concepts embedded in two significant microprocessor design philosophies (multi-core design and many-core design) resulted a 5-fold acceleration for the post-procedure module and a 40-fold acceleration for the real-time module. Some innovative software modules, which are still in testing phase, have been recently created to exploit the power of multiple GPUs together. ii
A Novel Multistage Image Registration Technique with Graph-based Region Descriptors
, 2013
"... To the best of my knowledge and as understood by the student in the Research Integrity and Copyright Disclaimer (Graduate School Form 20), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy on Integrity in Research ” and the use of copyrighted material. Approved by Maj ..."
Abstract
- Add to MetaCart
To the best of my knowledge and as understood by the student in the Research Integrity and Copyright Disclaimer (Graduate School Form 20), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy on Integrity in Research ” and the use of copyrighted material. Approved by Major Professor(s): ____________________________________
unknown title
"... 2. Multi robotic system structure The main object of research in this work is a web connected mobile robot system, for example, the inspec-The main problem of the following paper is control ..."
Abstract
- Add to MetaCart
2. Multi robotic system structure The main object of research in this work is a web connected mobile robot system, for example, the inspec-The main problem of the following paper is control
EVALUATION OF MULTI-CORE ARCHITECTURES FOR IMAGE PROCESSING ALGORITHMS
, 2009
"... Diverse application scenarios and real-time constraints on computer-vision applications have motivated numerous explorations of computer architectures that provide more efficiency through hardware scalability by exploiting the characteristics of image processing and computer vision algorithms. The g ..."
Abstract
- Add to MetaCart
(Show Context)
Diverse application scenarios and real-time constraints on computer-vision applications have motivated numerous explorations of computer architectures that provide more efficiency through hardware scalability by exploiting the characteristics of image processing and computer vision algorithms. The growing and computational power and programmability of the of multi-core architectures provide great prospects for acceleration of image processing and computer vision algorithms which can be parallelized. This thesis undertakes a novel study to find unique attributes of three widely used algorithms in computer vision, and identifies computer architecture(s) best suited for each algorithm. Significant acceleration over standard CPU implementations is obtained by exploiting data, thread and instruction parallelism provided by modern programmable graphics hardware. We test the following architectures most used for graphics and imaging applications: Intel Pentium 4 HT, Intel Core 2 Duo, NVidia 8 Series GPU and Sony PlayStation3 (PS3) CellBE. Additionally, we have optimized two image processing and computer vision algorithms, namely Canny edge detection and KLT tracking for the PS3. The architectures ’ capabilities of handling three image processing algorithms of varying complexity were evaluated over standard inputs. The
unknown title
"... Abstract—State Abstract—State estimation and estimation control and arecontrol intimately are intimately related related ..."
Abstract
- Add to MetaCart
Abstract—State Abstract—State estimation and estimation control and arecontrol intimately are intimately related related
como parte de los requisitos para optar al grado de
"... como parte de los requisitos para optar al grado de Magister en Ciencias de la Ingenierı́a ..."
Abstract
- Add to MetaCart
(Show Context)
como parte de los requisitos para optar al grado de Magister en Ciencias de la Ingenierı́a
RESEARCH Open Access Convolution of large 3D images on GPU and its
"... decomposition ..."
(Show Context)
Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling
, 2011
"... Abstract Modern graphics processing units (GPUs) are commodity data-parallel coprocessors capable of high performance computation and data throughput. It is well known that the GPUs are ideal implementation platforms for image processing applications. However, the level of efforts and expertise to o ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Modern graphics processing units (GPUs) are commodity data-parallel coprocessors capable of high performance computation and data throughput. It is well known that the GPUs are ideal implementation platforms for image processing applications. However, the level of efforts and expertise to optimize the application perfor-mance is still substantial. This paper investigates the computation-to-core mapping strategies to probe the effi-ciency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme achieves a significant performance gain over the standard pixel-wise mapping scheme. With in-depth performance comparisons across the two different mapping schemes, we analyze the impact of the level of parallelism on the GPU computation and suggest two principles for optimizing future image processing applica-tions on the GPU platform.