• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 428
Next 10 →

Automatic Code Generation for SIMD Hardware Accelerators

by Serge Guelton, François Irigoin, Ronan Keryell
"... Abstract. SIMD hardware accelerators offer an alternative to manycores when energy consumption and performance are critical. For scientific computing, GPGPUs are used in many computers of the top-500. But embedded processors also use accelerators. However such heterogeneous platforms trade ease of d ..."
Abstract - Add to MetaCart
Abstract. SIMD hardware accelerators offer an alternative to manycores when energy consumption and performance are critical. For scientific computing, GPGPUs are used in many computers of the top-500. But embedded processors also use accelerators. However such heterogeneous platforms trade ease

Liquid SIMD: Abstracting SIMD hardware using lightweight dynamic mapping

by Nathan Clark, Amir Hormati, Sami Yehia, Scott Mahlke - In HPCA’07 , 2007
"... Microprocessor designers commonly utilize SIMD accel-erators and their associated instruction set extensions to pro-vide substantial performance gains at a relatively low cost for media applications. One of the most difficult problems with using SIMD accelerators is forward migration to newer genera ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
generations. With larger hardware budgets and more de-mands for performance, SIMD accelerators evolve with both larger data widths and increased functionality with each new generation. However, this causes difficult problems in terms of binary compatibility, software migration costs, and ex-pensive redesign

The design and implementation of FFTW3

by Matteo Frigo, Steven G. Johnson - PROCEEDINGS OF THE IEEE , 2005
"... FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our cu ..."
Abstract - Cited by 726 (3 self) - Add to MetaCart
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our

Compiling For SIMD Within A Register

by Randall J. Fisher, All J. Fisher, Henry G. Dietz - 11th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC98 , 1998
"... . Although SIMD (Single Instruction stream Multiple Data stream) parallel computers have existed for decades, it is only in the past few years that a new version of SIMD has evolved: SIMD Within A Register (SWAR). Unlike other styles of SIMD hardware, SWAR models are tuned to be integrated withi ..."
Abstract - Cited by 35 (1 self) - Add to MetaCart
. Although SIMD (Single Instruction stream Multiple Data stream) parallel computers have existed for decades, it is only in the past few years that a new version of SIMD has evolved: SIMD Within A Register (SWAR). Unlike other styles of SIMD hardware, SWAR models are tuned to be integrated

Sierra: A SIMD Extension for C++

by Roland Leißa , Immanuel Haffner , Sebastian Hack
"... Nowadays, SIMD hardware is omnipresent in computers. Nonetheless, many software projects make hardly use of SIMD instructions: Applications are usually written in general-purpose languages like C++. However, general-purpose languages only provide poor abstractions for SIMD programming enforcing an ..."
Abstract - Add to MetaCart
Nowadays, SIMD hardware is omnipresent in computers. Nonetheless, many software projects make hardly use of SIMD instructions: Applications are usually written in general-purpose languages like C++. However, general-purpose languages only provide poor abstractions for SIMD programming enforcing

Asynchronous Problems on SIMD Parallel Computers

by Wei Shu, Min-you Wu - IEEE Trans. Parallel and Distributed Systems , 1995
"... Abstract { One of the essential problems in parallel computing is: can SIMD machines handle asynchronous problems? This is a di cult, unsolved problem because of the mismatch between asynchronous problems and SIMD architectures. We propose a solution to let SIMD machines handle general asynchronous ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
problems. Our approach is to implement a runtime support system which can run MIMD-like software on SIMD hardware. The runtime support system, named P kernel, is thread-based. There are two major advantages of the thread-based model. First, for application problems with irregular and/or unpredictable

Maximizing SIMD Resource Utilization in GPGPUs with SIMD Lane Permutation

by Minsoo Rhu, Mattan Erez - In 40th International Symposium on Computer Architecture (ISCA-40 , 2013
"... Current GPUs maintain high programmability by abstract-ing the SIMD nature of the hardware as independent concur-rent threads of control with hardware responsible for gen-erating predicate masks to utilize the SIMD hardware for different flows of control. This dynamic masking leads to poor utilizati ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Current GPUs maintain high programmability by abstract-ing the SIMD nature of the hardware as independent concur-rent threads of control with hardware responsible for gen-erating predicate masks to utilize the SIMD hardware for different flows of control. This dynamic masking leads to poor

Boost.simd: generic programming for portable simdization

by Pierre Estérie , Mathias Gaunard , Joel Falcou , Jean-Thierry Lapresté , Brigitte Rozoy , 2012
"... ABSTRACT SIMD extensions have been a feature of choice for processor manufacturers for a couple of decades. Designed to exploit data parallelism in applications at the instruction level, these extensions still require a high level of expertise or the use of potentially fragile compiler support or v ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
or vendor-specific libraries. While a large fraction of their theoretical accelerations can be obtained using such tools, exploiting such hardware becomes tedious as soon as application portability across hardware is required. In this paper, we describe Boost.SIMD, a C++ template library that simplifies

Recursive Filtering on SIMD Architectures

by Rainer Schaffer, Michaell Hosemann, Michael Hosemann, Renate Merker, Gerhard Fettweis - in Proceedings of IEEE Workshop on Signal Processing Systems 2003 (SIPS’03), Seoul, Korea , 2003
"... Recursive filters are used frequently in digital signal processing. They can be implemented in dedicated hardware or in software on a digital signal processor (DSP). Software solutions often are preferable for their speed of implementation and flexibility. However, contemporary DSPs are mostly not f ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Recursive filters are used frequently in digital signal processing. They can be implemented in dedicated hardware or in software on a digital signal processor (DSP). Software solutions often are preferable for their speed of implementation and flexibility. However, contemporary DSPs are mostly

SIMD Programming by Expansion by

by Jaewook Shin
"... AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up Nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform p0ublicly and display publicly, by or on behalf of the G ..."
Abstract - Add to MetaCart
of the Government. SIMD Programming by Expansion Since its advent 30 years ago, single-instruction multiple-data (SIMD) functional units continue to provide an opportunity for high performance at a low hardware cost. However, a general consensus is that only a class of well-formed computations is suitable for SIMD
Next 10 →
Results 1 - 10 of 428
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University