Results 1 - 10
of
428
Automatic Code Generation for SIMD Hardware Accelerators
"... Abstract. SIMD hardware accelerators offer an alternative to manycores when energy consumption and performance are critical. For scientific computing, GPGPUs are used in many computers of the top-500. But embedded processors also use accelerators. However such heterogeneous platforms trade ease of d ..."
Abstract
- Add to MetaCart
Abstract. SIMD hardware accelerators offer an alternative to manycores when energy consumption and performance are critical. For scientific computing, GPGPUs are used in many computers of the top-500. But embedded processors also use accelerators. However such heterogeneous platforms trade ease
Liquid SIMD: Abstracting SIMD hardware using lightweight dynamic mapping
- In HPCA’07
, 2007
"... Microprocessor designers commonly utilize SIMD accel-erators and their associated instruction set extensions to pro-vide substantial performance gains at a relatively low cost for media applications. One of the most difficult problems with using SIMD accelerators is forward migration to newer genera ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
generations. With larger hardware budgets and more de-mands for performance, SIMD accelerators evolve with both larger data widths and increased functionality with each new generation. However, this causes difficult problems in terms of binary compatibility, software migration costs, and ex-pensive redesign
The design and implementation of FFTW3
- PROCEEDINGS OF THE IEEE
, 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our cu ..."
Abstract
-
Cited by 726 (3 self)
- Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our
Compiling For SIMD Within A Register
- 11th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC98
, 1998
"... . Although SIMD (Single Instruction stream Multiple Data stream) parallel computers have existed for decades, it is only in the past few years that a new version of SIMD has evolved: SIMD Within A Register (SWAR). Unlike other styles of SIMD hardware, SWAR models are tuned to be integrated withi ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
. Although SIMD (Single Instruction stream Multiple Data stream) parallel computers have existed for decades, it is only in the past few years that a new version of SIMD has evolved: SIMD Within A Register (SWAR). Unlike other styles of SIMD hardware, SWAR models are tuned to be integrated
Sierra: A SIMD Extension for C++
"... Nowadays, SIMD hardware is omnipresent in computers. Nonetheless, many software projects make hardly use of SIMD instructions: Applications are usually written in general-purpose languages like C++. However, general-purpose languages only provide poor abstractions for SIMD programming enforcing an ..."
Abstract
- Add to MetaCart
Nowadays, SIMD hardware is omnipresent in computers. Nonetheless, many software projects make hardly use of SIMD instructions: Applications are usually written in general-purpose languages like C++. However, general-purpose languages only provide poor abstractions for SIMD programming enforcing
Asynchronous Problems on SIMD Parallel Computers
- IEEE Trans. Parallel and Distributed Systems
, 1995
"... Abstract { One of the essential problems in parallel computing is: can SIMD machines handle asynchronous problems? This is a di cult, unsolved problem because of the mismatch between asynchronous problems and SIMD architectures. We propose a solution to let SIMD machines handle general asynchronous ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
problems. Our approach is to implement a runtime support system which can run MIMD-like software on SIMD hardware. The runtime support system, named P kernel, is thread-based. There are two major advantages of the thread-based model. First, for application problems with irregular and/or unpredictable
Maximizing SIMD Resource Utilization in GPGPUs with SIMD Lane Permutation
- In 40th International Symposium on Computer Architecture (ISCA-40
, 2013
"... Current GPUs maintain high programmability by abstract-ing the SIMD nature of the hardware as independent concur-rent threads of control with hardware responsible for gen-erating predicate masks to utilize the SIMD hardware for different flows of control. This dynamic masking leads to poor utilizati ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Current GPUs maintain high programmability by abstract-ing the SIMD nature of the hardware as independent concur-rent threads of control with hardware responsible for gen-erating predicate masks to utilize the SIMD hardware for different flows of control. This dynamic masking leads to poor
Boost.simd: generic programming for portable simdization
, 2012
"... ABSTRACT SIMD extensions have been a feature of choice for processor manufacturers for a couple of decades. Designed to exploit data parallelism in applications at the instruction level, these extensions still require a high level of expertise or the use of potentially fragile compiler support or v ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
or vendor-specific libraries. While a large fraction of their theoretical accelerations can be obtained using such tools, exploiting such hardware becomes tedious as soon as application portability across hardware is required. In this paper, we describe Boost.SIMD, a C++ template library that simplifies
Recursive Filtering on SIMD Architectures
- in Proceedings of IEEE Workshop on Signal Processing Systems 2003 (SIPS’03), Seoul, Korea
, 2003
"... Recursive filters are used frequently in digital signal processing. They can be implemented in dedicated hardware or in software on a digital signal processor (DSP). Software solutions often are preferable for their speed of implementation and flexibility. However, contemporary DSPs are mostly not f ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Recursive filters are used frequently in digital signal processing. They can be implemented in dedicated hardware or in software on a digital signal processor (DSP). Software solutions often are preferable for their speed of implementation and flexibility. However, contemporary DSPs are mostly
SIMD Programming by Expansion by
"... AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up Nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform p0ublicly and display publicly, by or on behalf of the G ..."
Abstract
- Add to MetaCart
of the Government. SIMD Programming by Expansion Since its advent 30 years ago, single-instruction multiple-data (SIMD) functional units continue to provide an opportunity for high performance at a low hardware cost. However, a general consensus is that only a class of well-formed computations is suitable for SIMD
Results 1 - 10
of
428