Results 11 - 20
of
97
MetiTarski: An Automatic Theorem Prover for Real-Valued Special Functions
"... Abstract Many theorems involving special functions such as ln, exp and sin can be proved automatically by MetiTarski: a resolution theorem prover modified to call a decision procedure for the theory of real closed fields. Special functions are approximated by upper and lower bounds, which are typica ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Abstract Many theorems involving special functions such as ln, exp and sin can be proved automatically by MetiTarski: a resolution theorem prover modified to call a decision procedure for the theory of real closed fields. Special functions are approximated by upper and lower bounds, which are typically rational functions derived from Taylor or continued fraction expansions. The decision procedure simplifies clauses by deleting literals that are inconsistent with other algebraic facts. MetiTarski simplifies arithmetic expressions by conversion to a recursive representation, followed by flattening of nested quotients. Applications include verifying hybrid and control systems.
StReAm: Object-Oriented Programming of Stream Architectures using PAM-Blox
- Field-Programmable Logic and Applications, LNCS 1896
, 2000
"... Simplifying the programming models is paramount to the success of reconfigurable computing. We apply the principles of object-oriented programming to the design of stream architectures for reconfigurable computing. The resulting tool, StReAm, is a domain specific compiler on top of the object-orient ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Simplifying the programming models is paramount to the success of reconfigurable computing. We apply the principles of object-oriented programming to the design of stream architectures for reconfigurable computing. The resulting tool, StReAm, is a domain specific compiler on top of the object-oriented module generation environment PAM-Blox. Combining module generation with a high-level programming tool in C++ gives the programmer the convenience to explore the flexibility of FPGAs on the arithmetic level and write the algorithms in the same language and environment. Stream architectures consist of the pipelined dataflow graph mapped directly to hardware. Data streams through the implementation of the dataflow graph with only minimal control logic overhead. The main advantage of stream architectures is a clock-frequency equal to the data-rate leading to very low power consumption.
Towards the post-ultimate libm
, 2005
"... This article presents advances on the subject of correctly rounded elementary functions since the publication of the libultim mathematical library developed by Ziv at IBM. This library showed that the average performance and memory overhead of correct rounding could be made negligible. However, the ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
This article presents advances on the subject of correctly rounded elementary functions since the publication of the libultim mathematical library developed by Ziv at IBM. This library showed that the average performance and memory overhead of correct rounding could be made negligible. However, the worst-case overhead was still a factor 1000 or more. It is shown here that, with current processor technology, this worst-case overhead can be kept within a factor of 2 to 10 of current best libms. This low overhead has very positive consequences on the techniques for implementing and proving correctly rounded functions, which are also studied. These results lift the last technical obstacles to a generalisation of (at least some) correctly rounded double precision elementary functions.
Assisted verification of elementary functions using Gappa
- In Proceedings of the 2006 ACM symposium on Applied computing
, 2006
"... The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lin ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lines of code for each function, and will usually be broken by the smallest change to the code (e.g. for maintenance or optimization purpose). Therefore, it is very tedious and error-prone if done by hand. This article discusses the use of the Gappa proof assistant in this context. Gappa has two main advantages over previous approaches: Its input format is very close to the actual C code to validate, and it automates error evaluation and propagation using interval arithmetic. Besides, it can be used to incrementally prove complex mathematical properties pertaining to the C code. Yet it does not require any specific knowledge about automatic theorem proving, and thus is accessible to a wider community. Moreover, Gappa may generate a formal proof of the results that can be checked independently by a lowerlevel proof assistant like Coq, hence providing an even higher confidence in the certification of the numerical code. 1.
Hierarchical segmentation schemes for function evaluation
- In IEEE Conference on Field-Programmable Technology
, 2003
"... This paper presents a method for evaluating functions based on piecewise polynomial approximation with a novel hierarchical segmentation scheme. The use of a novel hierarchy scheme of uniform segments and segments with size varying by powers of two enables us to approximate nonlinear regions of a fu ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
This paper presents a method for evaluating functions based on piecewise polynomial approximation with a novel hierarchical segmentation scheme. The use of a novel hierarchy scheme of uniform segments and segments with size varying by powers of two enables us to approximate nonlinear regions of a function particularly well. This partitioning is automated: efficient look-up tables and their coefficients are generated for a given function, input range, order of the polynomials, desired accuracy and finite precision constraints. We describe an algorithm to find the optimum number of segments and the placement of their boundaries, which is used to analyze the properties of a function and to benchmark our approach. Our method is illustrated using three non-linear compound functions, √ − log(x), x log(x) and a high order rational function. We present results for various operand sizes between 8 and 24 bits for first and second order polynomial approximations. 1
Proposal for a standardization of mathematical function implementation in floating-point arithmetic
- NUMERICAL ALGORITHMS
, 2004
"... ..."
Return of the hardware floating-point elementary function
- in 18th Symposium on Computer Arithmetic. IEEE
, 2007
"... The study of specific hardware circuits for the evaluation of floating-point elementary functions was once an active research area, until it was realized that these functions were not frequent enough to justify dedicating silicon to them. Research then turned to software functions. This situation ma ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
The study of specific hardware circuits for the evaluation of floating-point elementary functions was once an active research area, until it was realized that these functions were not frequent enough to justify dedicating silicon to them. Research then turned to software functions. This situation may be about to change again with the advent of reconfigurable co-processors based on field-programmable gate arrays. Such co-processors now have a capacity that allows them to accomodate double-precision floating-point computing. Hardware operators for elementary functions targeted to such platforms have the potential to vastly outperform software functions, and will not permanently waste silicon resources. This article studies the optimization, for this target technology, of operators for the exponential and logarithm functions up to double-precision. These operators are freely available from www.ens-lyon.fr/LIP/ Arenaire/. Keywords Floating-point elementary functions, hardware
A parameterized floating-point exponential function for FPGAs
- IN IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT’05
, 2005
"... A parameterized floating-point exponential operator is presented. In single-precision, it uses a small fraction of the FPGA’s resources and has a smaller latency than its software equivalent on a high-end processor, and ten times the throughput in pipelined version. Previous work had shown that FPGA ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
A parameterized floating-point exponential operator is presented. In single-precision, it uses a small fraction of the FPGA’s resources and has a smaller latency than its software equivalent on a high-end processor, and ten times the throughput in pipelined version. Previous work had shown that FPGAs could use massive parallelism to balance the poor performance of their basic floating-point operators compared to the equivalent in processors. As this work shows, when evaluating an elementary function, the flexibility of FPGAs provides much better performance than the processor witout even resorting to parallelism.
Theorems on efficient argument reductions
- Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH16
, 2003
"... A commonly used argument reduction technique in elementary function computations begins with two positive floating point numbers α and γ that approximate (usually irrational but not necessarily) numbers 1/C and C, e.g., C = 2π for trigonometric functions and ln 2 for e x. Given an argument to the fu ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
A commonly used argument reduction technique in elementary function computations begins with two positive floating point numbers α and γ that approximate (usually irrational but not necessarily) numbers 1/C and C, e.g., C = 2π for trigonometric functions and ln 2 for e x. Given an argument to the function of interest it extracts z as defined by xα = z + ς with z = k2 −N and |ς | ≤ 2 −N−1, where k, N are integers and N ≥ 0 is preselected, and then computes u = x − zγ. Usually zγ takes more bits than the working precision provides for storing its significand, and thus exact x − zγ may not be represented exactly by a floating point number of the same precision. This will cause performance penalty when the working precision is the highest available on the underlying hardware and thus considerable extra work is needed to get all the bits of x − zγ right. This paper presents theorems that show under mild conditions that can be easily met on today’s computer hardware and still allow α ≈ 1/C and γ ≈ C to almost the full working precision, x − zγ is a floating point number of the same precision. An algorithmic procedure based on the theorems is obtained. The results will enhance performance, in particular on machines that has hardware support for fusedmultiply-add (fma) instruction(s). 1
A Hardware Gaussian Noise Generator Using the Box-Muller Method and Its Error Analysis
- IEEE Trans. Computers
, 2006
"... Abstract—We present a hardware Gaussian noise generator based on the Box-Muller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardware-based simulation system, such as for exploring channel code behavior at very low bit error rates, as lo ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Abstract—We present a hardware Gaussian noise generator based on the Box-Muller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardware-based simulation system, such as for exploring channel code behavior at very low bit error rates, as low as 10 12 to 10 13. The main novelties of this work are accurate analytical error analysis and bit-width optimization for the elementary functions involved in the Box-Muller method. Two 16-bit noise samples are generated every clock cycle and, due to the accurate error analysis, every sample is analytically guaranteed to be accurate to one unit in the last place. An implementation on a Xilinx Virtex-4 XC4VLX100-12 FPGA occupies 1,452 slices, three block RAMs, and 12 DSP slices, and is capable of generating 750 million samples per second at a clock speed of 375 MHz. The performance can be improved by exploiting concurrent execution: 37 parallel instances of the noise generator at 95 MHz on a Xilinx Virtex-II Pro XC2VP100-7 FPGA generate seven billion samples per second and can run over 200 times faster than the output produced by software running on an Intel Pentium-4 3 GHz PC. The noise generator is currently being used at the Jet Propulsion Laboratory, NASA to evaluate the performance of low-density parity-check codes for deep-space communications. Index Terms—Algorithms implemented in hardware, computer arithmetic, error analysis, elementary function approximation, field programmable gate arrays, minimax approximation and algorithms, optimization, random number generation, simulation. 1

