Results 1 - 10
of
10
Reconfigurable computing: architectures and design methods
- IEE Proceedings - Computers and Digital Techniques
, 2005
"... Abstract: Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Vir ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Abstract: Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Virtex 4 FPGA devices. The authors identify major trends in general-purpose and specialpurpose
Pipelined mixed precision algorithms on FPGAs for fast and accurate PDE solvers from low precision components
- In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM
, 2006
"... FPGAs are becoming more and more attractive for high precision scientific computations. One of the main problems in efficient resource utilization is the quadratically growing resource usage of multipliers depending on the operand size. Many research efforts have been devoted to the optimization of ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
FPGAs are becoming more and more attractive for high precision scientific computations. One of the main problems in efficient resource utilization is the quadratically growing resource usage of multipliers depending on the operand size. Many research efforts have been devoted to the optimization of individual arithmetic and linear algebra operations. In this paper we take a higher level approach and seek to reduce the intermediate computational precision on the algorithmic level by optimizing the accuracy towards the final result of an algorithm. In our case this is the accurate solution of partial differential equations (PDEs). Using the Poisson Problem as a typical PDE example we show that most intermediate operations can be computed with floats or even smaller formats and only very few operations (e.g. 1%) must be performed in double precision to obtain the same accuracy as a full double precision solver. Thus the FPGA can be configured with many parallel float rather than few resource hungry double operations. To achieve this, we adapt the general concept of mixed precision iterative refinement methods to FPGAs and develop a fully pipelined version of the Conjugate Gradient solver. We combine this solver with different iterative refinement schemes and precision combinations to obtain resource efficient mappings of the pipelined algorithm core onto the FPGA. 1.
Customisable hardware compilation
- In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, ERSA ’04
, 2004
"... Hardware compilers for high-level languages are increasingly recognised to be the key to reducing the productivity gap for advanced circuit development in general, and for reconfigurable designs in particular. This paper explains how customisable frameworks for hardware compilation can enable rapid ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Hardware compilers for high-level languages are increasingly recognised to be the key to reducing the productivity gap for advanced circuit development in general, and for reconfigurable designs in particular. This paper explains how customisable frameworks for hardware compilation can enable rapid design exploration, and reusable and extensible hardware optimisation. It describes such a framework, based on a parallel imperative language, which supports multiple levels of design abstraction, transformational development, optimisation by compiler passes, and metalanguage facilities. Our approach has been used in producing designs for applications such as signal and image processing, with different trade-offs in performance and resource usage. 1
Adaptive range reduction for hardware function evaluation
- In Proc. IEEE Int’l Conf. on Field-Programmable Technology
, 2004
"... Function evaluation f(x) typically consists of range reduction and the actual function evaluation on a small interval. In this paper, we investigate optimization of range reduction given the range and precision of x and f(x). For every function evaluation there exists a convenient interval such as [ ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Function evaluation f(x) typically consists of range reduction and the actual function evaluation on a small interval. In this paper, we investigate optimization of range reduction given the range and precision of x and f(x). For every function evaluation there exists a convenient interval such as [0,π/2) for sin(x). The adaptive range reduction method, which we propose in this work, involves deciding whether range reduction can be used effectively for a particular design. The decision depends on the function being evaluated, precision, and optimization metrics such as area, latency and throughput. In addition, the input and output range has an impact on the preferable function evaluation method such as polynomial, table-based, or combinations of the two. We explore this vast design space of adaptive range reduction for fixed-point sin(x), log(x) and √ x accurate to one unit in the last place using MATLAB and ASC, A Stream Compiler. These tools enable us to study over 1000 designs resulting in over 40 million Xilinx equivalent circuit gates, in a few hours ’ time. The final objective is to progress towards a fully automated library that provides optimal function evaluation hardware units given input/output range and precision. 1
Evaluation of Static Analysis Techniques for Fixed-Point Precision Optimization
"... Abstract—Precision analysis and optimization is very important when transforming a floating-point algorithm into fixedpoint hardware implementations. The core analysis techniques are either based on dynamic analysis or static analysis. We believe in static error analysis, as it is the only technique ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Precision analysis and optimization is very important when transforming a floating-point algorithm into fixedpoint hardware implementations. The core analysis techniques are either based on dynamic analysis or static analysis. We believe in static error analysis, as it is the only technique that can guarantee the desired worst-case accuracy. In this paper we study various underlying arithmetic candidates that can be used in static error analysis and compare their computed sensitivities. The approaches studied include Affine Arithmetic (AA), General Interval Arithmetic (GIA) and Automatic Differentiation (Symbolic Arithmetic). Our study shows that symbolic method is preferred for expressions with higher order cancelation. For programs without strong cancelation, any method works fairly well and GIA slightly outperforms others. We also study the impact of program transformations on these arithmetics. I.
Trustworthy Numerical Computation in Scala
"... Modern computing has adopted the floating point type as a default way to describe computations with real numbers. Thanks to dedicated hardware support, such computations are efficient on modern architectures, even in double precision. However, rigorous reasoning about the resulting programs remains ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Modern computing has adopted the floating point type as a default way to describe computations with real numbers. Thanks to dedicated hardware support, such computations are efficient on modern architectures, even in double precision. However, rigorous reasoning about the resulting programs remains difficult. This is in part due to a large gap between the finite floating point representation and the infiniteprecision real-number semantics that serves as the developers’ mental model. Because programming languages do not provide support for estimating errors, some computations in practice are performed more and some less precisely than needed. We present a library solution for rigorous arithmetic computation. Our numerical data type library tracks a (double) floating point value, but also a guaranteed upper bound on the error between this value and the ideal value that would be computed in the real-value semantics. Our implementation involves a set of linear approximations based on an extension of affine arithmetic. The derived approximations cover most of the standard mathematical operations, including trigonometric functions, and are more comprehensive than any publicly available ones. Moreover, while interval arithmetic rapidly yields overly pessimistic estimates, our approach remains precise for several computational tasks of interest. We evaluate the library on a number of examples from numerical analysis and physical simulations. We found it to be a useful tool for gaining confidence in the correctness of the computation.
Profile-directed speculative optimization of reconfigurable floating point data paths
"... Abstract. This paper presents a methodology for generating floatingpoint arithmetic hardware designs which are, for suitable applications, dramatically reduced in size, while still retaining performance. We use a profiling tool for floating-point value ranges to identify arithmetic operations where ..."
Abstract
- Add to MetaCart
Abstract. This paper presents a methodology for generating floatingpoint arithmetic hardware designs which are, for suitable applications, dramatically reduced in size, while still retaining performance. We use a profiling tool for floating-point value ranges to identify arithmetic operations where the shifting required for alignment and normalisation is almost always small. We synthesise hardware with reduced-size barrelshifters, but always detect when operands lie outside the range this optimised hardware can handle. These rare out-of-range operations are handled by a separate full floating-point implementation, either on-chip or by returning calculations to the host. Thus the system suffers no compromise in IEEE754 compliance. This paper presents results for two benchmark applications which profiling suggested would be profitable. We demonstrate the potential for this technique to yield an increase in parallel computing power of up to 43%, with a (correctable) error rate of less than 5%. 1
Real-Number Optimisation: A Speculative, Profile-Guided Approach
, 2007
"... From supercomputers for computational science to embedded processors in mobile phones, most important computing applications manipulate the set of real numbers, R. How these numbers are represented varies, with embedded applications picking fixed-point formats compatible with integer operations and ..."
Abstract
- Add to MetaCart
From supercomputers for computational science to embedded processors in mobile phones, most important computing applications manipulate the set of real numbers, R. How these numbers are represented varies, with embedded applications picking fixed-point formats compatible with integer operations and larger machines using IEEE-754 floating point or a close variant. A large body of work describes methods for optimising floating point representations using static analysis techniques, however these must always take a conservative approach if they intend to ensure correctness. Taking our inspiration from work on speculative execution and profile-guided compiler optimisations, we lay out a series of tools and techniques to produce optimised real-number representations. Our speculative approach aims for greater reductions in hardware area and execution time than with more conservative approaches, while providing fall-back options to ensure correctness in case of incorrect speculation. We describe a profiling tool for x86 binaries which reveals bucketised value ranges for floatingpoint operations within applications. A selection of profiling results for real-world scientific
Speculative Reduction of Floating Point
"... Abstract. This paper presents a methodology for generating floatingpoint arithmetic hardware designs which are, for suitable applications, dramatically reduced in size, while still retaining performance. We use a profiling tool for floating-point value ranges to identify arithmetic operations where ..."
Abstract
- Add to MetaCart
Abstract. This paper presents a methodology for generating floatingpoint arithmetic hardware designs which are, for suitable applications, dramatically reduced in size, while still retaining performance. We use a profiling tool for floating-point value ranges to identify arithmetic operations where the shifting required for operand alignment is almost always small. We synthesise hardware with reduced-size barrel-shifters, but always detect when operands lie outside the range this optimised hardware can handle. These rare out-of-range operations are handled by a separate full floating-point implementation, either on-chip or by returning calculations to the host. Thus the system suffers no compromise in IEEE754 compliance. This paper presents results for two benchmark applications which profiling suggested would be profitable. We demonstrate the potential for this technique to yield an increase in parallel computing power of up to 43%, with a (correctable) error rate of less than 5%. We profile a number of other applications and comment on their suitability for our technique. 1
A Computational Approach to Custom Data Representation for Hardware Accelerators
, 2010
"... This thesis details the application of computational methods to the problem of determining custom data representations when building hardware accelerators for numerical computations. A majority of scientific applications which require hardware acceleration are implemented in IEEE-754 double precisio ..."
Abstract
- Add to MetaCart
This thesis details the application of computational methods to the problem of determining custom data representations when building hardware accelerators for numerical computations. A majority of scientific applications which require hardware acceleration are implemented in IEEE-754 double precision. However, in many cases the error tolerance requirements of the application are much less than the accuracy which IEEE-754 double precision provides. By leveraging custom data representations, a more resource efficient hardware implementation arises thereby enabling greater parallelism and thus higher performance of the accelerator. The existing custom representation methods are unable to guarantee robust representations while at the same time adequately supporting ill-conditioned operators. Support for both of these scenarios is necessary for accelerating scientific calculations. To address this, we propose the use of a computational method based on Satisfiability-Modulo Theory (SMT). By capturing a calculation as a set of constraints, an SMT instance can be formulated which provides meaningful bounds even in the presence of ill-conditioned operators.

