Results 1  10
of
11,532
64bit floatingpoint FPGA matrix multiplication
 In ACM/SIGDA FieldProgrammable Gate Arrays
, 2005
"... We introduce a 64bit ANSI/IEEE Std 7541985 floating point design of a hardware matrix multiplier optimized for FPGA implementations. A general block matrix multiplication algorithm, applicable for an arbitrary matrix size is proposed. The algorithm potentially enables optimum performance by exploi ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
We introduce a 64bit ANSI/IEEE Std 7541985 floating point design of a hardware matrix multiplier optimized for FPGA implementations. A general block matrix multiplication algorithm, applicable for an arbitrary matrix size is proposed. The algorithm potentially enables optimum performance
Improved FloatingPoint Matrix Multiplier *
"... Abstract – Floatingpoint matrix multiplier is widely used in scientific computations. A great deal of efforts has been made to achieve higher performance. The matrix multiplication consists of many multiplications and accumulations. Yang and Duh proposed a modular design of floatingpoint matrix mu ..."
Abstract
 Add to MetaCart
Abstract – Floatingpoint matrix multiplier is widely used in scientific computations. A great deal of efforts has been made to achieve higher performance. The matrix multiplication consists of many multiplications and accumulations. Yang and Duh proposed a modular design of floatingpoint matrix
Algorithms for Nonnegative Matrix Factorization
 In NIPS
, 2001
"... Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minim ..."
Abstract

Cited by 1246 (5 self)
 Add to MetaCart
Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown
Accurate Matrix Multiplication with Multiple Floatingpoint Numbers
"... Abstract—This paper is concerned with an accurate computation of matrix multiplication, where components of matrices are represented by summation of floatingpoint numbers. Recently, an accurate summation algorithm is developed by the latter three of the authors. In this paper, it is specialized to ..."
Abstract
 Add to MetaCart
Abstract—This paper is concerned with an accurate computation of matrix multiplication, where components of matrices are represented by summation of floatingpoint numbers. Recently, an accurate summation algorithm is developed by the latter three of the authors. In this paper, it is specialized
The control of the false discovery rate in multiple testing under dependency
 Annals of Statistics
, 2001
"... Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparab ..."
Abstract

Cited by 1093 (16 self)
 Add to MetaCart
Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than
Energy Performance of FloatingPoint Matrix Multiplication on FPGAs
"... Floatingpoint matrix multiplication is a basic kernel in scientific computing. It has been shown that implementations of this kernel on FPGAs can achieve high sustained performance [1]. However, to the best of our knowledge, existing work on FPGAbased floatingpoint matrix multiplication considers ..."
Abstract
 Add to MetaCart
Floatingpoint matrix multiplication is a basic kernel in scientific computing. It has been shown that implementations of this kernel on FPGAs can achieve high sustained performance [1]. However, to the best of our knowledge, existing work on FPGAbased floatingpoint matrix multiplication
Capacity of a Mobile MultipleAntenna Communication Link in Rayleigh Flat Fading
"... We analyze a mobile wireless link comprising M transmitter and N receiver antennas operating in a Rayleigh flatfading environment. The propagation coefficients between every pair of transmitter and receiver antennas are statistically independent and unknown; they remain constant for a coherence int ..."
Abstract

Cited by 495 (22 self)
 Add to MetaCart
signals. We prove that there is no point in making the number of transmitter antennas greater than the length of the coherence interval: the capacity for M> Tis equal to the capacity for M = T. Capacity is achieved when the T M transmitted signal matrix is equal to the product of two statistically
Scalable and Modular Algorithms for FloatingPoint Matrix Multiplication on
"... The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of hardware implementations of scientific computations. In this paper, we propose two FPGAbased algorithms for floatingpoint matrix multiplication, a fundamental kernel in a number of scientific a ..."
Abstract
 Add to MetaCart
The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of hardware implementations of scientific computations. In this paper, we propose two FPGAbased algorithms for floatingpoint matrix multiplication, a fundamental kernel in a number of scientific
FPGA Accelerator for FloatingPoint Matrix Multiplication
 IET COMPUTERS & DIGITAL TECHNIQUES
, 2012
"... This study treats architecture and implementation of a FPGA accelerator for doubleprecision floatingpoint matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns t ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This study treats architecture and implementation of a FPGA accelerator for doubleprecision floatingpoint matrix multiplication. The architecture is oriented towards minimising resource utilisation and maximising clock frequency. It employs the block matrix multiplication algorithm which returns
A Data Locality Optimizing Algorithm
, 1991
"... This paper proposes an algorithm that improves the locality of a loop nest by transforming the code via interchange, reversal, skewing and tiling. The loop transformation algorithm is based on two concepts: a mathematical formulation of reuse and locality, and a loop transformation theory that unifi ..."
Abstract

Cited by 804 (16 self)
 Add to MetaCart
that unifies the various transforms as unimodular matrix transformations. The algorithm has been implemented in the SUIF (Stanford University Intermediate Format) compiler, and is successful in optimizing codes such as matrix multiplication, successive overrelaxation (SOR), LU decomposition without pivoting
Results 1  10
of
11,532