#### DMCA

## eingereicht von

### Citations

2222 |
Iterative methods for sparse linear systems
- Saad
- 2003
(Show Context)
Citation Context ...ed on obtaining a numerically approximate solution for a given mathematical model of a structure. The resulting linear system is characterized by the system matrix A which is usually large and sparse =-=[29,37,83,98]-=-. In general, iterative solvers, such as the Conjugate Gradient (CG) method, are almost dominated by SMVM operations (usually more than 95%). The CG method is the most popular iterative method for num... |

1356 |
Cramming more components onto integrated circuits. Electronics
- Moore
- 1965
(Show Context)
Citation Context ...In 1965 Gordon E. Moore has predicted a long-term trend of computing hardware, in which the number of transistors that can be placed on an integrated circuit will double approximately every two years =-=[80]-=-. Note that it is often incorrectly cited as a doubling of transistors every 18 months. The actual average period is about 20 months. Historically, Moore’s Law has precisely described a driving force ... |

1093 |
Methods of conjugate gradients for solving linear systems
- Hestenes, Stiefel
- 1952
(Show Context)
Citation Context ...sually more than 95%). The CG method is the most popular iterative method for numerically solving a system of linear equations, A · x = b, where A is a Symmetric Positive-Definite (SPD) sparse matrix =-=[52,77]-=-. Moreover, Google’s PageRank (PR) Eigenvalue problem is considered to be the world’s largest sparse matrix calculation. This problem is also dominated by SMVM operations where the target matrix is ex... |

506 |
Interconnection Networks: An Engineering Approach
- Duato, Yalamanchili, et al.
- 1997
(Show Context)
Citation Context ...ill be output, otherwise it will wait until the output port is free. Here the Round-Robin strategy is selected to prevent the deadlock hazard when multiple packets arrive in a switch at the same time =-=[35]-=-. This switching approach is usually known as the store– and–forward method. It is a packet switching protocol in which the node stores the complete packet and forwards it based on the information wit... |

495 | Overview of the Scalable Video Coding Extension of the H.264/AVC
- Schwarz, Marpe, et al.
- 2007
(Show Context)
Citation Context ...as been added [74]. Recently, the H.264/AVC has been extended to support Scalable Video Coding (SVC) by Joint Video Team of ITU-T VCEG 38 Discrete Cosine Integer Transform (DCIT) 4 and ISO MPEG (JVT) =-=[100,101]-=-. Furthermore, Ultra–HD (UHD) TV, producing a 7,680×4,320 pixel resolution (a.k.a. Super Hi-Vision) and the next generation High Efficiency Video Coding (HEVC), will soon need very high throughput per... |

245 |
Thousand core chips: a technology perspective DAC ’07
- Borkar
- 2007
(Show Context)
Citation Context ...l processor arrays) [104, 120, 121, 132]. In addition, modulized circuit design has developed to very large SoC by utilizing reusable/configurable Intellectual Property (IP) cores as much as possible =-=[17]-=-. Soon traditional bus transmission architecture will be unable to satisfy the need for more than thousand cores on a single silicon die. Hence, a better network switching method is desired. These cha... |

235 |
A Fast Computational Algorithm for the Discrete Cosine Tranfsorm
- Wen-Hsiung, Smith, et al.
- 1977
(Show Context)
Citation Context ...ated into the DCT transformation due to its regular structure, which is more efficient for VLSI implementation. In the literature, Alen Docef et al. [34] proposed a joint implementation of Chen’s DCT =-=[23]-=- and Quantization (i.e. a conventional QDCT design). Later, Hanli Wang et al. [126] merged the quantization process represented by a quantization matrix that has variable quantization step sizes into ... |

193 | Power minimization in IC design: Principles and applications
- Pedram
- 1996
(Show Context)
Citation Context ...ircuit Design Issues: Source of Power Dissipation Power consumption in a CMOS technology can be described by a simple equation that summarizes the three most important contributors to its final value =-=[15, 87]-=-. PTotal = PDynamic + PShort + PLeakage. (2.1) These three components are dynamic power dissipation (PDynamic), short circuit power dissipation (PShort) and leakage power dissipation (PLeakage). PLeak... |

171 |
A survey of research and practices of network-on-chip. ACMComput
- Bjerregaard, Mahadevan
(Show Context)
Citation Context ... which the node stores the complete packet and forwards it based on the information within its header. Thus the packet may stall if the router in the forwarding path does not have enough buffer space =-=[16]-=-. However, for the simple and easy implementation of the presented SMVM–NoC in the FPGA, store–and–forward routing is preferred at beginning. On the other hand, wormhole switching combines packet swit... |

169 | System-level power optimization: techniques and tools
- Benini, Micheli
- 2000
(Show Context)
Citation Context .... A proper choice between the efficient algorithm and energy budget for performing the function (whether implemented in hardware or software) strongly affects system performance and power dissipation =-=[13,53]-=-. • Behavioral level - After determining the implementation of the function by hardware or software, this stage targets on the optimization of hardware resources and the optimization of the average nu... |

135 | A Survey of CORDIC Algorithms for FPGA based Computers
- Andraka
- 1998
(Show Context)
Citation Context ...s will be described and compared, particularly on the architecture level. We elaborate the way of implementing a CORDIC rotation with reasonable computational complexity by trading off the throughput =-=[6, 83]-=-. Discrete Cosine Integer Transform (DCIT) VLSI implementation of both forward and inverse CORDIC based Quantized DCIT (QDCIT) is presented. This configurable architecture not only performs multiplier... |

129 |
2004 H.264 and mpeg-4 video compression
- Richardson
(Show Context)
Citation Context ...wareness and multi-standard integration. The DCT transformation is a very important video coding component of many modern Image/Video compression standards (e.g. JPEG, MPEG–2, MPEG–4, H.263 and so on =-=[26, 94]-=-). In H.264 Codec, which is a joint standard of the ITU-T video compression and the ISO/IEC MPEG-4 Part 10 Advanced Video Coding (AVC), the DCT part is replaced by an integer transform using a block s... |

126 |
Practical fast 1-D DCT algorithms with 11 multiplications
- Loeffler, Ligtenberg, et al.
- 1989
(Show Context)
Citation Context ...a 1-D Loeffler DCT requires in total 11 mul and 29 add operations. In the meantime, the theoretical lower bound of the number of multiplications required for the 1-D DCT had also been proven to be 11 =-=[36, 73]-=-. Implementing the Loeffler DCT with the CORDIC method by ignoring some unnoticeable iterations and shifting the compensation steps of each angle to the quantizer yields the simplified CLDCT. Figure 4... |

114 |
Network on a Chip: An architecture for billion transistor era
- Hemani, Jantsch, et al.
- 2000
(Show Context)
Citation Context ... delay is 10 times larger than the gate node delay as shown in Figure 2.3. Moreover, the data synchronization issue with a single clock source has also become a critical problem for circuit synthesis =-=[14,51]-=-. That means the timing closure issue on the large SoC design is difficult to be solved. In the meantime, design resource reuse concerns all additional activities that have to be performed to generate... |

99 | Low-complexity transform and quantization in H.264/AVC
- Malvar, Hallapuro, et al.
- 2003
(Show Context)
Citation Context ...CT part is replaced by an integer transform using a block size of 4×4 pixels. For adding profiles for High Definition (HD) videos, an integer transform using a block size of 8×8 pixels has been added =-=[74]-=-. Recently, the H.264/AVC has been extended to support Scalable Video Coding (SVC) by Joint Video Team of ITU-T VCEG 38 Discrete Cosine Integer Transform (DCIT) 4 and ISO MPEG (JVT) [100,101]. Further... |

96 | Leakage current: Moore’s law meets static power
- Kim, Austin, et al.
- 2003
(Show Context)
Citation Context ...inence of multimedia portable systems and the need to limit power consumption in very-high density VLSI chips have led to rapid and innovative developments in low power design during the recent years =-=[69,130]-=-. The driving forces behind these developments are portable applications requiring low power dissipation, such as tablet computer, smart phone and portable embedded device. In most of these cases, the... |

95 |
Ballistic carbon nanotube field-effect transistors
- Javey, Guo, et al.
- 2003
(Show Context)
Citation Context ...ists are trying to replace the current silicon based MOSFET by a novel carbon based Carbon Nanotube Field Effect Transistor (CNFET) or spintronics in order to shrink the node size into the atom level =-=[7, 61, 86]-=-. If it comes true, the computer will usher in a new era “Beyond CMOS” (also known as “More Moore”). Unfortunately, it seems that this is probably not going to happen so easily in the next 10 years. O... |

85 |
F.T.: The solution of singular–value and symmetric eigenvalue problems on multiprocessor arrays
- Brent, Luk
- 1985
(Show Context)
Citation Context ...arallel with Jacobi’s iterative method is selected as an important example in this thesis, because the convergence of this method is very robust to modifications of the homogeneous processor elements =-=[18,46,47,70]-=-. It is simple, concise and inherent parallel for both implementation and computation. In [108,109], a Jacobi EVD array was realized by implementing the µ–CORDIC processor, which only performs a prede... |

78 |
CORDIC-Based VLSI Architectures for Digital Signal Processing
- Hu
- 1992
(Show Context)
Citation Context ...inhφt zn = 0 Hyperbolic xn = √ x20 − y20 Vector yn = 0 zn = arctanh(y0/x0) = tanh −1(y0/x0) by Walther [124]. A large variety of operations can be easily realized by the structure of CORDIC algorithm =-=[50,56,83]-=-. There are three primary types of CORDIC algorithm, the orthogonal, the linear and the hyperbolic. Each type has two basic operation modes, rotation and vector, which are summarized in Table 3.1. The... |

77 | DyAD: smart routing for networkson-chip
- Hu, Marculescu
- 2004
(Show Context)
Citation Context ...ture with a mesh style packet-switched network work is an abstraction of the communication among components and must satisfy Quality-of-Service (QoS) requirements, such as reliability and performance =-=[51, 54]-=-. 2.4 Circuit Design Issues: Low Power Besides the modular circuit design issue, power dissipation has also been considered as a critical constraint in the design of digital systems. One reason is the... |

54 |
BSystem level performance evaluation of three-dimensional integrated circuits
- Rahman, Reif
- 2000
(Show Context)
Citation Context ...the next 10 years. On the other hand, other groups came out with another potential way to keep Moore’s Law alive by using a Three-Dimensional IC (3D-IC) concept to increase the density of transistors =-=[66,91]-=-. As far as we can see the Through-Silicon Via (TSV) technology for 3D-IC will be feasible before 2012. More and more evidences point out that the trend of Moore’s Law becomes slow, especially the 200... |

42 |
Redundant and On-Line CORDIC Application to Matrix Triangularization and SVD
- Ercegovac, Lang
- 1990
(Show Context)
Citation Context ...logy and in adaptive signal processing. There are many applications using the CORDIC algorithm, such as solving systems of linear equations [2,4,55,60], computation of eigenvalues and singular values =-=[30,38,102]-=-, Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters [31, 32, 115]. The CORDIC algorithm offers the opportunity to calculate all the desired functi... |

42 | The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations
- Meurant
(Show Context)
Citation Context ...sually more than 95%). The CG method is the most popular iterative method for numerically solving a system of linear equations, A · x = b, where A is a Symmetric Positive-Definite (SPD) sparse matrix =-=[52,77]-=-. Moreover, Google’s PageRank (PR) Eigenvalue problem is considered to be the world’s largest sparse matrix calculation. This problem is also dominated by SMVM operations where the target matrix is ex... |

36 | A gracefully degrading and energy-efficient modular router architecture for on-chip networks
- Kim, Nicopoulos, et al.
- 2006
(Show Context)
Citation Context ...are also important design criteria for embedded systems. For this reason, a conventional 5×5 crossbar switch is now separated into two small 3×3 crossbar switches in order to reduce the area overhead =-=[68]-=-. Figure 6.4 illustrates a typical switch component for the local PE to communicate with its neighbor PEs consists of a set of input/output ports, dualcrossbar switches, four input FIFOs and controlle... |

34 |
Video Codec Design
- Richardson
- 2002
(Show Context)
Citation Context ...iven in Table 4.2, and QStep as given in Table 4.3. Equation 4.8 shows the typical case of quantization phase when the inputs Fi are results from DC values for MPEG-4 (Luma or Chroma) in the DCT mode =-=[93]-=- or result from H.264 in the integer mode [94]. Note that the range of QPs for DC values is not linear as indicated in Table 4.2. On the other hand, when the inputs are AC values for MPEG-4 intra fram... |

28 |
Highly Concurrent Computing Structure for Matrix Arithmetic and Signal Processing
- Ahmed, Delosme, et al.
- 1982
(Show Context)
Citation Context ...examples are algorithms used in digital communication technology and in adaptive signal processing. There are many applications using the CORDIC algorithm, such as solving systems of linear equations =-=[2,4,55,60]-=-, computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters [31, 32, 115]. The CORDIC algorith... |

24 | Electrical modeling and characterization of through silicon via for three-dimensional ICs
- Katti, Stucchi, et al.
(Show Context)
Citation Context ...the next 10 years. On the other hand, other groups came out with another potential way to keep Moore’s Law alive by using a Three-Dimensional IC (3D-IC) concept to increase the density of transistors =-=[66,91]-=-. As far as we can see the Through-Silicon Via (TSV) technology for 3D-IC will be feasible before 2012. More and more evidences point out that the trend of Moore’s Law becomes slow, especially the 200... |

24 |
A 65 nm 2-Billion Transistor QuadCore Itanium Processor
- Stackhouse, Bhimji, et al.
- 2009
(Show Context)
Citation Context ...ing parallel computing, which has received great attention. It has been introduced in many state-of-the-art applications in the past few years (e.g. Six-Core CPU, MPSoC and parallel processor arrays) =-=[104, 120, 121, 132]-=-. In addition, modulized circuit design has developed to very large SoC by utilizing reusable/configurable Intellectual Property (IP) cores as much as possible [17]. Soon traditional bus transmission ... |

20 | An Efficient Jacobi-Like Algorithm for Parallel Eigenvalue Computation
- Götze, Paul, et al.
- 1993
(Show Context)
Citation Context ...arallel with Jacobi’s iterative method is selected as an important example in this thesis, because the convergence of this method is very robust to modifications of the homogeneous processor elements =-=[18,46,47,70]-=-. It is simple, concise and inherent parallel for both implementation and computation. In [108,109], a Jacobi EVD array was realized by implementing the µ–CORDIC processor, which only performs a prede... |

19 | A leakage reduction methodology for distributed MTCMOS
- Calhoun, Honoré, et al.
- 2004
(Show Context)
Citation Context ... [78]. At logic level, low power synthesis for a large SoC chip can be further reduced in average 10%–20% by applying these methodologies: Multi–Voltage, Multi–Threshold CMOS (MTCMOS) or Power Gating =-=[20, 67]-=-. • Physical level - In the last stage, the circuit representation is converted into a layout of the chip. Layout is created by converting each logic component (cells, macros, gates or transistors) in... |

19 |
Pipelined CORDIC architectures for fast VLSI filtering and array processing
- Deprettere, Dewilde, et al.
- 1984
(Show Context)
Citation Context ...ms of linear equations [2,4,55,60], computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters =-=[31, 32, 115]-=-. The CORDIC algorithm offers the opportunity to calculate all the desired functions and applications in a rather simple and elegant way in circuit design [6]. The CORDIC algorithm was first presented... |

17 | Dynamic ordering for a parallel block-Jacobi SVD algorithm - Becka, Oksa, et al. |

16 | Low power design of DCT and IDCT for low bit rate video codecs
- August, Ha
- 2004
(Show Context)
Citation Context ...ut and supporting multi-functions for different Codecs. Note that inverse version of the FQDCIT can be easily modified by reconfiguring the modules and flushing the lookup tables. Second, in [97] and =-=[9]-=-, the cores achieve very high frequency by optimizing the pipeline stage (assuming better arithmetic units), but provide very low throughput due 4.5 Experimental Results 67 to the serial output, i.e. ... |

16 | An Algorithm and Architecture Based on Orthonormal Micro-Rotations for Computing the Symmetric EVD
- Götze, Hekstra
- 1995
(Show Context)
Citation Context ...eased number of rotations due to the behavior of Jacobi’s algorithm. Nevertheless, the computational complexity is actually decreased, which also results in lower energy consumption per EVD operation =-=[46,109]-=-. The implementation results demonstrate that using the simplified architecture is beneficial concerning the design criteria since it yields smaller area overhead, faster 1 Introduction 5 overall comp... |

13 | Design tools for digital microfluidic biochips: Towards functional diversification and more than Moore
- Chakrabarty, Fair, et al.
(Show Context)
Citation Context ...h a very fast trend and Moore’s Law is expected to hold for the next decade [1, 41] or extend to the More than Moore concept (a prediction for the integration of more than thousand cores before 2020) =-=[21, 64, 95]-=-. 10 years ago, for 0.35µm technology, design engineers focused on reducing the area size to lower down the cost. Later, when it came to 0.13µm technology, they paid huge efforts to improve the signal... |

13 |
Low-power multiplierless DCT architecture using image correlation
- Jeong, Kim, et al.
- 2004
(Show Context)
Citation Context ...y CORDIC iterations Type/Operation Mul Add W–Shift B–Shift Mux Novel IQDCT [126] 32 52 8 0 0 scaled CORDIC DCT [139] 4 64 + 80×6 80×6 0 0 CORDIC DCT/IDCT [116] 0 36 + 80×10 80×10 0 0 CORDIC based DCT =-=[62]-=- 0 208 176 0 0 CORDIC Loeffler DCT [112] 0 76 32 0 0 IQDCIT–S1 0 92 24 24 44 IQDCIT–S2 0 112 36 32 52 IQDCIT (default) 0 124 40 40 60 IQDCIT–C1 0 184 80 56 76 4.5.1 Variable Iteration Steps of CORDIC ... |

12 | Floating Point Cordic
- Hekstra, Deprettere
- 1993
(Show Context)
Citation Context ...inhφt zn = 0 Hyperbolic xn = √ x20 − y20 Vector yn = 0 zn = arctanh(y0/x0) = tanh −1(y0/x0) by Walther [124]. A large variety of operations can be easily realized by the structure of CORDIC algorithm =-=[50,56,83]-=-. There are three primary types of CORDIC algorithm, the orthogonal, the linear and the hyperbolic. Each type has two basic operation modes, rotation and vector, which are summarized in Table 3.1. The... |

11 |
The quantized DCT and its application to DCT-based video coding
- Docef, Kossentini, et al.
- 2002
(Show Context)
Citation Context ...ially, only the uniform quantization can be incoperated into the DCT transformation due to its regular structure, which is more efficient for VLSI implementation. In the literature, Alen Docef et al. =-=[34]-=- proposed a joint implementation of Chen’s DCT [23] and Quantization (i.e. a conventional QDCT design). Later, Hanli Wang et al. [126] merged the quantization process represented by a quantization mat... |

9 | Block-Jacobi SVD algorithms for Distributed Memory System II:Meshes - Becka, Vajtersic - 1999 |

8 |
FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method”.
- El-Kurdi, Fernandez, et al.
- 2008
(Show Context)
Citation Context ...ed on obtaining a numerically approximate solution for a given mathematical model of a structure. The resulting linear system is characterized by the system matrix A which is usually large and sparse =-=[29,37,83,98]-=-. In general, iterative solvers, such as the Conjugate Gradient (CG) method, are almost dominated by SMVM operations (usually more than 95%). The CG method is the most popular iterative method for num... |

7 |
New 2n DCT algorithms suitable for VLSI implementation
- Duhamel, H’Mida
- 1987
(Show Context)
Citation Context ...a 1-D Loeffler DCT requires in total 11 mul and 29 add operations. In the meantime, the theoretical lower bound of the number of multiplications required for the 1-D DCT had also been proven to be 11 =-=[36, 73]-=-. Implementing the Loeffler DCT with the CORDIC method by ignoring some unnoticeable iterations and shifting the compensation steps of each angle to the quantizer yields the simplified CLDCT. Figure 4... |

7 |
The Quantum Limit to Moore’s Law
- Powell
- 2008
(Show Context)
Citation Context ... more than a billion transistors. In Appendix A, Table A.1 shows more detailed information for each x86 12 Introduction to VLSI Design 2 CPU model. In the future, Moore’s Law will continue until 2020 =-=[41,90]-=- or maybe even further. After that, soon CMOS technology will meet its physical limitation when the node size is smaller than 10nm. Now many scientists are trying to replace the current silicon based ... |

5 |
Improved SVD systolic array and implementation on FPGA
- Ahmedsaid, Amira, et al.
- 2003
(Show Context)
Citation Context ...implify the regular CORDIC architecture will be clarified. As the process technologies continue to shrink to the VDSM level, it becomes possible to directly implement a full parallel Jacobi EVD array =-=[3, 72]-=-. However, the size of EVD array with the regular CORDIC that could be implemented on current device is still small. Therefore, it is necessary to simplify the architecture in order to integrate more ... |

5 |
An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization
- Guo, Ju, et al.
- 2004
(Show Context)
Citation Context ...Of course, we can further reduce the critical timing by replacing the default “Ripple Carry Adder” in current implementation by “Carry Save Adder” or “Carry Look-ahead Adder” for each pipelined stage =-=[49]-=-. On the other hand, the size of the core can be further shrieked if we remove the pipeline and folding the row-column decomposition as one single DCIT and one CORDIC-Scaler with a buffer memory [9]. ... |

5 |
Application of Coordinate Rotation Algorithm to Singular Value Decomposition
- Sibul, Fogelsanger
- 1984
(Show Context)
Citation Context ...logy and in adaptive signal processing. There are many applications using the CORDIC algorithm, such as solving systems of linear equations [2,4,55,60], computation of eigenvalues and singular values =-=[30,38,102]-=-, Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters [31, 32, 115]. The CORDIC algorithm offers the opportunity to calculate all the desired functi... |

5 |
Low-power and high-quality Cordic-based Loeffler DCT for signal processing
- Sun, Ruan, et al.
- 2007
(Show Context)
Citation Context ...arge (Full-HD/Ultra-HD). Moreover, it still retains an acceptable transformation quality compared to the default methods in terms of PSNR. This leads to a high-accuracy high throughput implementation =-=[106,107,112,113]-=-. Parallel Jacobi EVD method Parallel Jacobi method for Eigenvalue Decomposition (EVD) is chosen as an example to explain the design concepts concerning tradeoff between the complexity and the iterati... |

4 |
deLorimier and André DeHon. Floating-point sparse matrix-vector multiply for FPGAs
- Michael
- 2005
(Show Context)
Citation Context ...ed on obtaining a numerically approximate solution for a given mathematical model of a structure. The resulting linear system is characterized by the system matrix A which is usually large and sparse =-=[29,37,83,98]-=-. In general, iterative solvers, such as the Conjugate Gradient (CG) method, are almost dominated by SMVM operations (usually more than 95%). The CG method is the most popular iterative method for num... |

4 |
Synthesis and Fixed Point Implementation of Pipelined True Orthogonal Filters
- Deprettere, Ed
- 1983
(Show Context)
Citation Context ...ms of linear equations [2,4,55,60], computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters =-=[31, 32, 115]-=-. The CORDIC algorithm offers the opportunity to calculate all the desired functions and applications in a rather simple and elegant way in circuit design [6]. The CORDIC algorithm was first presented... |

4 |
Scaling: More than Moore’s law
- Kahng
- 2010
(Show Context)
Citation Context ...h a very fast trend and Moore’s Law is expected to hold for the next decade [1, 41] or extend to the More than Moore concept (a prediction for the integration of more than thousand cores before 2020) =-=[21, 64, 95]-=-. 10 years ago, for 0.35µm technology, design engineers focused on reducing the area size to lower down the cost. Later, when it came to 0.13µm technology, they paid huge efforts to improve the signal... |

4 |
ASIC and FPGA implementations of H.264 DCT and quantization blocks
- Kordasiewicz, Shirani
- 2005
(Show Context)
Citation Context ... final version of the chip ready for fabrication is shown in Figure 4.15. Note that the total number of the IOs required for this design is 198 IO pads, which would result in a low chip density as in =-=[71]-=-. Hence we have to remove the IO pads and treat it as a macro IP design. The input and output IOs are located in the left and lower side of the chip layout. Table 4.8 lists the comparison between our ... |

4 |
Adaptive Up-Sampling Method Using DCT for Spatial Scalability of Scalable Video Coding
- Shin, Park
- 2009
(Show Context)
Citation Context ...as been added [74]. Recently, the H.264/AVC has been extended to support Scalable Video Coding (SVC) by Joint Video Team of ITU-T VCEG 38 Discrete Cosine Integer Transform (DCIT) 4 and ISO MPEG (JVT) =-=[100,101]-=-. Furthermore, Ultra–HD (UHD) TV, producing a 7,680×4,320 pixel resolution (a.k.a. Super Hi-Vision) and the next generation High Efficiency Video Coding (HEVC), will soon need very high throughput per... |

4 | Lowcomplexity multi-purpose IP Core for quantized Discrete Cosine and integer transform
- Sun, Donner, et al.
- 2009
(Show Context)
Citation Context ... H.264 . . 44 4.4 Flow graph of the 8–point integer transform in H.264 . . 46 4.5 Flow graph of an 8–point FDCIT Transform with five configurable modules for multiplierless DCT and integer transforms =-=[106]-=- . . . . . . . . . . . . . . . . . . . . . . . 47 4.6 Three sub flow graphs of the modules of Figure 4.5 . . . 48 4.7 Flow graph of an 8–point IDCIT Transform with seven configurable modules for multi... |

3 | BinDCT and Its Efficient VLSI Architecture for Real-Time Embedded Applications
- Dang, Chau, et al.
- 2005
(Show Context)
Citation Context ... MHz 0.967 N/A AIR RC TB Pipe 2006 [97] 8×8 DCT/IDCT 0.35um 3.3V 3.00 11.7(K) N/A 300 MHz 2.4 178 cycles FGA RC TB Pipe 2005 [71] 4×4 QInt 0.18um 1.8V N/A 51.6(K) N/A 68 MHz 10.3 N/A FGA Pipe PL 2005 =-=[27]-=- 8×8 DCT 0.18um 1.55V 0.12(Chip) N/A N/A 5 MHz 0.24 46 cycles FGA RC Pipe TB 2004 [40] 8×8 DCT 0.18um 2.16 N/A 7.5(mW) N/A 0.9 80 cycles AIR RC TBRAM 2004 [9] 8×8 DCT 0.18um 1.8V N/A 14.7(K) N/A 180 M... |

3 |
A Processor for Two-Dimensional Symmetric Eigenvalue and Singular Value Arrays
- Delosme
- 1987
(Show Context)
Citation Context ...logy and in adaptive signal processing. There are many applications using the CORDIC algorithm, such as solving systems of linear equations [2,4,55,60], computation of eigenvalues and singular values =-=[30,38,102]-=-, Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters [31, 32, 115]. The CORDIC algorithm offers the opportunity to calculate all the desired functi... |

3 |
Early Determination of Zero-Quantized 8
- Ji, Kwong, et al.
- 1755
(Show Context)
Citation Context ... decreasing the bit length of the high frequency coefficients [84]. The zero prediction algorithm can also reduce the power consumption significantly by omitting the unnecessary computational efforts =-=[63, 127, 140]-=-. These two methods can reduce the computational efforts and power consumption by trading off the transformation quality in PSNR. Finally, the VLSI implementation results show that the presented CORDI... |

3 |
An FPGA architecture for the Pagerank eigenvector problem
- McGettrick, Geraghty, et al.
- 2008
(Show Context)
Citation Context ...ed multi-core environment, using the heterogeneous x86 based quad-core CPU to accelerate the SMVM computation in parallel. Google’s PR problem has also been investigated for acceleration with FPGA in =-=[76, 141]-=-. 6.2 SMVM on Network-on-Chip 103 Conventional SMVM architectures are usually focused on a dedicated internal chip interconnection to forward vector components and nonzero matrix elements among severa... |

3 |
CORDICbased architectures for the efficient implementation of discrete wavelet transforms
- Simon, Rieder, et al.
- 1996
(Show Context)
Citation Context ...are many applications using the CORDIC algorithm, such as solving systems of linear equations [2,4,55,60], computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) =-=[28,103]-=-, Discrete Cosine Transform (DCT) [75,112] and digital filters [31, 32, 115]. The CORDIC algorithm offers the opportunity to calculate all the desired functions and applications in a rather simple and... |

3 |
Olaf Storaasli. Sparse Matrix-Vector Multiplication Design on FPGAs
- Sun, Peterson
- 2007
(Show Context)
Citation Context ...tremely sparse, and unstructured. In the last decade, many researchers were dealing with the integration of pipelining and parallelism inherent in the SMVM computation in hardware designs. Sun et al. =-=[114]-=- proposed a SMVM design containing many Processing Elements (PEs) with pipelined floating-point units in FPGA. Gregg et al. [39] built a specialized memory controller to accelerate the SMVM. Götze an... |

3 |
Parallel VLSI Implementation of the Kalman Filter
- Sung, Hu
- 1987
(Show Context)
Citation Context ...ms of linear equations [2,4,55,60], computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters =-=[31, 32, 115]-=-. The CORDIC algorithm offers the opportunity to calculate all the desired functions and applications in a rather simple and elegant way in circuit design [6]. The CORDIC algorithm was first presented... |

2 |
Performance modeling for adaptive parallel embedded systems
- Asaad, Bapty
- 2002
(Show Context)
Citation Context ...processor platforms have received great attention and have been realized into several state10 Introduction to VLSI Design 2 of-the-art applications (e.g. Six–Core, MPSoC and parallel processor array) =-=[8, 17, 132]-=-. Besides the issue of parallelism, the ability to integrate all parts of applications on the same piece of silicon is also beneficial for lower power, greater reliability and reduced cost of manufact... |

2 |
Cell Switched Network-on-Chip Candidate for Billion-Transistor System-on-Chips
- Chen
- 2006
(Show Context)
Citation Context ...o the local processor element, leakage power becomes a major factor of the power consumption, and shared bus transmission is the new bottleneck in the billion transistors System–on–Chip (SoC) designs =-=[24, 99, 125]-=-. On the other hand, as product life cycle continues to shrink simultaneously, time–to–market also becomes a key design constraint. In consequence, these problems result in the famous “designer produc... |

2 |
A low-power DCT IP core based on 2D algebraic integer encoding
- Fu, Jullien, et al.
- 2004
(Show Context)
Citation Context ...MHz 2.4 178 cycles FGA RC TB Pipe 2005 [71] 4×4 QInt 0.18um 1.8V N/A 51.6(K) N/A 68 MHz 10.3 N/A FGA Pipe PL 2005 [27] 8×8 DCT 0.18um 1.55V 0.12(Chip) N/A N/A 5 MHz 0.24 46 cycles FGA RC Pipe TB 2004 =-=[40]-=- 8×8 DCT 0.18um 2.16 N/A 7.5(mW) N/A 0.9 80 cycles AIR RC TBRAM 2004 [9] 8×8 DCT 0.18um 1.8V N/A 14.7(K) N/A 180 MHz 0.352 392 cycles MUXRC TB 2000 [22] 8×8 DCT 0.6um 2.0V 50.5 38(K) 138(mW) 100 MHz N... |

2 |
Sparse matrix-vector multiplication on a systolic array
- Gotze, Schwiegelshohn
- 1988
(Show Context)
Citation Context ...design containing many Processing Elements (PEs) with pipelined floating-point units in FPGA. Gregg et al. [39] built a specialized memory controller to accelerate the SMVM. Götze and Schwiegelshohn =-=[48]-=- presented a systolic algorithm which allows the parallel execution of SMVM in a dedicated VLSI circuit. Williams et al. [129] used multi-core environment, using the heterogeneous x86 based quad-core ... |

2 |
VLSI CORDIC Array Structure Implementation of Toeplitz Eigensystem Solvers
- Hu, Chern
- 1990
(Show Context)
Citation Context ...examples are algorithms used in digital communication technology and in adaptive signal processing. There are many applications using the CORDIC algorithm, such as solving systems of linear equations =-=[2,4,55,60]-=-, computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters [31, 32, 115]. The CORDIC algorith... |

2 |
Edition, Executive Summary
- ITRS
- 2009
(Show Context)
Citation Context ...ntil 2010 . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 IC scaling roadmap for More than Moore (modified figure from 2009 International Technology Roadmap for Semiconductors Executive Summary) =-=[58]-=- . . . . . . . . . . . 13 2.3 Relative delays of interconnection wire and gate in nanoscale level (regenerated figure from International Technology Roadmap for Semiconductors 2003) [57] . . . . . . . ... |

2 |
Hardware architectures for eigenvalue computation of real symmetric matrices
- Liu, Bouganis, et al.
- 2009
(Show Context)
Citation Context ...implify the regular CORDIC architecture will be clarified. As the process technologies continue to shrink to the VDSM level, it becomes possible to directly implement a full parallel Jacobi EVD array =-=[3, 72]-=-. However, the size of EVD array with the regular CORDIC that could be implemented on current device is still small. Therefore, it is necessary to simplify the architecture in order to integrate more ... |

2 | Dynamic bitwidth adaptation in DCT: image quality versus computation energy trade-off - Park, Choi, et al. - 2006 |

2 |
Digital VLSI logic technology using Carbon Nanotube FETs: frequently asked questions
- Patil, Lin, et al.
- 2009
(Show Context)
Citation Context ...ists are trying to replace the current silicon based MOSFET by a novel carbon based Carbon Nanotube Field Effect Transistor (CNFET) or spintronics in order to shrink the node size into the atom level =-=[7, 61, 86]-=-. If it comes true, the computer will usher in a new era “Beyond CMOS” (also known as “More Moore”). Unfortunately, it seems that this is probably not going to happen so easily in the next 10 years. O... |

2 |
Aware Design Methodologies
- Power
- 2002
(Show Context)
Citation Context ...stem Drivers) [59] . . . . . . . . . 15 2.5 A typical NoC architecture with a mesh style packetswitched network . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Power reduction at each design level =-=[88]-=- . . . . . . . . . 18 2.7 A simple CMOS inverter . . . . . . . . . . . . . . . . . . 20 2.8 There are four components of leakage sources in NMOS: Subthreshold leakage (ISub), Gate-oxide leakage (IGate... |

2 |
Integrated MEMS Switches for Leakage Control of Battery Operated Systems
- Raychowdhury, Kim, et al.
- 2006
(Show Context)
Citation Context ...ns, typically lines, in multiple layers. Various power optimization techniques such as partitioning, fine placement, MEMS based power switch, transistor resizing, dynamic voltage scaling are employed =-=[89, 92]-=-. However, only 5%– 10% power reductions could be obtained at this level. 2.6 Circuit Design Issues: Source of Power Dissipation Power consumption in a CMOS technology can be described by a simple equ... |

2 |
Integrated Systems in the More-than-Moore Era: Designing Low-Cost Energy-Efficient Systems Using Heterogeneous Components
- Roy, Jung, et al.
- 2010
(Show Context)
Citation Context ...h a very fast trend and Moore’s Law is expected to hold for the next decade [1, 41] or extend to the More than Moore concept (a prediction for the integration of more than thousand cores before 2020) =-=[21, 64, 95]-=-. 10 years ago, for 0.35µm technology, design engineers focused on reducing the area size to lower down the cost. Later, when it came to 0.13µm technology, they paid huge efforts to improve the signal... |

2 | VLSI Circuit Design Concepts for Parallel Iterative Algorithms in Nanoscale
- Sun, Götze
- 2009
(Show Context)
Citation Context ...tations CORDIC with 32–bit accuracy, showing the rotation type, the 2× tan θ angle, the required shift–add operations for rotation and scaling, the required cycle delay and repeat numner for CORDIC–6 =-=[109]-=-. . . . . . . . . . . . . . . . . . . . . . . 99 5.2 Area, Delay and Power Consumption results of 4×4 and 10×10 Jacobi EVD arrays with the TSMC 45nm technology. . . . . . . . . . . . . . . . . . . . .... |

1 |
Years: 3D chip stacking will take Moore’s Law past 2020
- Moore’s
- 2010
(Show Context)
Citation Context ...dern Very Large Scale Integration (VLSI) manufacturing technology has kept shrinking down to Very Deep Sub-Micron (VDSM) with a very fast trend and Moore’s Law is expected to hold for the next decade =-=[1, 41]-=- or extend to the More than Moore concept (a prediction for the integration of more than thousand cores before 2020) [21, 64, 95]. 10 years ago, for 0.35µm technology, design engineers focused on redu... |

1 |
A VLSI-Suited Algorithm for Solving Linearly Constrained Least Squares Problems
- Ali, Götze
- 1991
(Show Context)
Citation Context ...examples are algorithms used in digital communication technology and in adaptive signal processing. There are many applications using the CORDIC algorithm, such as solving systems of linear equations =-=[2,4,55,60]-=-, computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) [75,112] and digital filters [31, 32, 115]. The CORDIC algorith... |

1 |
Silicon spintronics
- Appelbaum
- 2009
(Show Context)
Citation Context ...ists are trying to replace the current silicon based MOSFET by a novel carbon based Carbon Nanotube Field Effect Transistor (CNFET) or spintronics in order to shrink the node size into the atom level =-=[7, 61, 86]-=-. If it comes true, the computer will usher in a new era “Beyond CMOS” (also known as “More Moore”). Unfortunately, it seems that this is probably not going to happen so easily in the next 10 years. O... |

1 |
The Cadence Website. www.cadence.com
- Cadence
- 2010
(Show Context)
Citation Context ...r, these RTL codes are synthesized by Synopsys Design Compiler 2009 with TSMC 0.18µm standard cell libraries. At the end, the final Place & Route stage is performed with the Cadence SoC Encounter 8.1 =-=[19,117]-=-. The final version of the chip ready for fabrication is shown in Figure 4.15. Note that the total number of the IOs required for this design is 198 IO pads, which would result in a low chip density a... |

1 |
A Low Power 8×8
- Chang, Jiu, et al.
- 2000
(Show Context)
Citation Context .../A N/A 5 MHz 0.24 46 cycles FGA RC Pipe TB 2004 [40] 8×8 DCT 0.18um 2.16 N/A 7.5(mW) N/A 0.9 80 cycles AIR RC TBRAM 2004 [9] 8×8 DCT 0.18um 1.8V N/A 14.7(K) N/A 180 MHz 0.352 392 cycles MUXRC TB 2000 =-=[22]-=- 8×8 DCT 0.6um 2.0V 50.5 38(K) 138(mW) 100 MHz N/A 198 cycles DA PL 1998 [135] 8×8 IDCT 0.7um 1.3V 20.7(Chip) 40(K) 4.65(mW) 14 MHz 0.896 N/A RC TB Pipe DA (Distributed Arithmetic), FGA (Flow-Graph Ar... |

1 |
An extension of J.83 annex B transmission systems for ultra-high definition (UD) TV broadcasting
- Cho, Heo, et al.
- 2009
(Show Context)
Citation Context ...more, Ultra–HD (UHD) TV, producing a 7,680×4,320 pixel resolution (a.k.a. Super Hi-Vision) and the next generation High Efficiency Video Coding (HEVC), will soon need very high throughput performance =-=[25, 79]-=-. Therefore, the multifunction integration of different Codecs with very high throughput is a challenge to design engineers implementing the Codecs for supporting 3D/UHD displays. A low–power and high... |

1 |
A low complexity architecture for complex discrete wavelet transform
- Das, Banerjee
- 2003
(Show Context)
Citation Context ...are many applications using the CORDIC algorithm, such as solving systems of linear equations [2,4,55,60], computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) =-=[28,103]-=-, Discrete Cosine Transform (DCT) [75,112] and digital filters [31, 32, 115]. The CORDIC algorithm offers the opportunity to calculate all the desired functions and applications in a rather simple and... |

1 |
Pei-Kuei Tsung, Tzu-Der Chuang, Pai-Heng Hsiao, Yu-Han Chen, Hsu-Kuang Chiu, Shao-Yi Chien, and Liang-Gee Chen. A 212 MPixels/s 4096 2160p Multiview Video Encoder Chip for 3D/Quad Full HDTV Applications
- Ding, Chen
- 2010
(Show Context)
Citation Context ...y to the previous approaches, the proposed one processes 8 samples in parallel per clock cycle, leading to a higher throughput, which is very suitable for Full– HD or even future 4K/8K UHD resolution =-=[25,33]-=-. This, however, does not lead to a very high increase in the chip area. The latency is shorter than other designs due to the fine–gained pipeline architecture. The critical timing delay is 7.121ns. O... |

1 |
et al. FPGA Based Sparse Matrix Vector Multiplication using Commodity DRAM Memory
- Gregg
- 2007
(Show Context)
Citation Context ...elism inherent in the SMVM computation in hardware designs. Sun et al. [114] proposed a SMVM design containing many Processing Elements (PEs) with pipelined floating-point units in FPGA. Gregg et al. =-=[39]-=- built a specialized memory controller to accelerate the SMVM. Götze and Schwiegelshohn [48] presented a systolic algorithm which allows the parallel execution of SMVM in a dedicated VLSI circuit. Wi... |

1 |
Moore’s Law: ”We See No End in Sight
- Gelsinger
- 2008
(Show Context)
Citation Context ...dern Very Large Scale Integration (VLSI) manufacturing technology has kept shrinking down to Very Deep Sub-Micron (VDSM) with a very fast trend and Moore’s Law is expected to hold for the next decade =-=[1, 41]-=- or extend to the More than Moore concept (a prediction for the integration of more than thousand cores before 2020) [21, 64, 95]. 10 years ago, for 0.35µm technology, design engineers focused on redu... |

1 | European Union Patent Application: EP1850597,Method and circuit for performing cordic based loeffler discrete cosine transformation (dct) for signal processing - Goetze, Heyne, et al. - 2007 |

1 | USA Pending Patent: US20070250557, Method and circuit for performing cordic based loeffler discrete cosine transformation (dct) for signal processing - Goetze, Heyne, et al. - 2007 |

1 | Taiwan Patent: TW200600143651, Method and circuit for performing cordic based loeffler discrete cosine transformation (dct) for signal processing - Goetze, Heyne, et al. - 2010 |

1 |
Efficient CORDIC Based Implementation of Selected Signal Processing Algorithms
- Heyne
- 2008
(Show Context)
Citation Context .... A proper choice between the efficient algorithm and energy budget for performing the function (whether implemented in hardware or software) strongly affects system performance and power dissipation =-=[13,53]-=-. • Behavioral level - After determining the implementation of the function by hardware or software, this stage targets on the optimization of hardware resources and the optimization of the average nu... |

1 |
Jainandunsing and Deprettere E.F. A New Class of Parallel Algorithm for Solving Systems of Linear Equation
- unknown authors
- 1989
(Show Context)
Citation Context |

1 |
Low Pwoer Methodology Manual For System-on-Chip Design
- Keating, Flynn, et al.
- 2008
(Show Context)
Citation Context ...een the sequential elements are described in a HDL program such as Verilog or VHDL. Moreover, the right choice of clock optimization strategy and pipelining will strongly affect the power consumption =-=[67, 138]-=-. • Logic level - The goal of this level is to generate a structural view of a logic-level model. Logic synthesis is the manipulation of logic specifications to create logic models as interconnection ... |

1 |
Low Power Enhancements for Parallel Algorithms
- Klauke, Götze
- 2001
(Show Context)
Citation Context ...arallel with Jacobi’s iterative method is selected as an important example in this thesis, because the convergence of this method is very robust to modifications of the homogeneous processor elements =-=[18,46,47,70]-=-. It is simple, concise and inherent parallel for both implementation and computation. In [108,109], a Jacobi EVD array was realized by implementing the µ–CORDIC processor, which only performs a prede... |

1 |
A fast DCT processor, based on special purpose CORDIC rotators
- Mariatos, Metafas, et al.
- 1994
(Show Context)
Citation Context ...orithm, such as solving systems of linear equations [2,4,55,60], computation of eigenvalues and singular values [30,38,102], Discrete Wavelet Transform (DWT) [28,103], Discrete Cosine Transform (DCT) =-=[75,112]-=- and digital filters [31, 32, 115]. The CORDIC algorithm offers the opportunity to calculate all the desired functions and applications in a rather simple and elegant way in circuit design [6]. The CO... |

1 |
Synthesis and Optimization of Digital Circuits. Electrical Engineering
- Micheli
- 1994
(Show Context)
Citation Context ...ance (netlist) of library cells (i.e. the back– 2.6 Circuit Design Issues: Source of Power Dissipation 19 end logic synthesis tools), is often referred to as a library binding or a technology mapping =-=[78]-=-. At logic level, low power synthesis for a large SoC chip can be further reduced in average 10%–20% by applying these methodologies: Multi–Voltage, Multi–Threshold CMOS (MTCMOS) or Power Gating [20, ... |

1 |
Steps Toward the Practical Use of Super HiVision
- MIKIO
- 2006
(Show Context)
Citation Context ...more, Ultra–HD (UHD) TV, producing a 7,680×4,320 pixel resolution (a.k.a. Super Hi-Vision) and the next generation High Efficiency Video Coding (HEVC), will soon need very high throughput performance =-=[25, 79]-=-. Therefore, the multifunction integration of different Codecs with very high throughput is a challenge to design engineers implementing the Codecs for supporting 3D/UHD displays. A low–power and high... |

1 |
Parhi and Takao Nishitani. Digial Signal Processing for Multimedia Systems
- Keshab
- 1999
(Show Context)
Citation Context ...s will be described and compared, particularly on the architecture level. We elaborate the way of implementing a CORDIC rotation with reasonable computational complexity by trading off the throughput =-=[6, 83]-=-. Discrete Cosine Integer Transform (DCIT) VLSI implementation of both forward and inverse CORDIC based Quantized DCIT (QDCIT) is presented. This configurable architecture not only performs multiplier... |

1 |
Pedram and Hirendu Vaishnav. Power Optimization in VLSI Layout: A Survey
- Massoud
- 1997
(Show Context)
Citation Context ...ns, typically lines, in multiple layers. Various power optimization techniques such as partitioning, fine placement, MEMS based power switch, transistor resizing, dynamic voltage scaling are employed =-=[89, 92]-=-. However, only 5%– 10% power reductions could be obtained at this level. 2.6 Circuit Design Issues: Source of Power Dissipation Power consumption in a CMOS technology can be described by a simple equ... |

1 |
Delay and Power Minimization in VLSI Interconnects with SpatioTemporal Bus-Encoding Scheme
- Sainarayanan, Raghunandan, et al.
- 2007
(Show Context)
Citation Context ...o the local processor element, leakage power becomes a major factor of the power consumption, and shared bus transmission is the new bottleneck in the billion transistors System–on–Chip (SoC) designs =-=[24, 99, 125]-=-. On the other hand, as product life cycle continues to shrink simultaneously, time–to–market also becomes a key design constraint. In consequence, these problems result in the famous “designer produc... |

1 |
The JM 16.1 H.264/AVC. iphome.hhi.de/suehring/tml
- Suehring
- 2010
(Show Context)
Citation Context ...e proposed 2-D forward and inverse QDCIT transformations have been tested with the video coding standard MPEG–4 and H.264 by using publicly available software, XVID Codec 1.2.2 [137] and JM16.1 Codec =-=[105]-=-. The default DCT algorithm in the Codec of the 68 Discrete Cosine Integer Transform (DCIT) 4 0 13 88 8f ea 22 59 f0 0b e9 88 8f ea 22 59 f0 0b e9 53 48 da 84 63 fc 8e 23 de 31 2d 59 84 fd 70 56 6c b0... |

1 |
Jürgen Götze. VLSI Implementation of a Configurable IP Core for Quantized Discrete Cosine and Integer Transforms
- Sun, Donner
- 2011
(Show Context)
Citation Context ...arge (Full-HD/Ultra-HD). Moreover, it still retains an acceptable transformation quality compared to the default methods in terms of PSNR. This leads to a high-accuracy high throughput implementation =-=[106,107,112,113]-=-. Parallel Jacobi EVD method Parallel Jacobi method for Eigenvalue Decomposition (EVD) is chosen as an example to explain the design concepts concerning tradeoff between the complexity and the iterati... |

1 |
Jürgen Götze, Hong-Yuan Jheng, and Shanq-Jang Ruan
- Sun
(Show Context)
Citation Context ...sented as a novel solution for parallel matrix computation to further accelerate many iterative solvers in hardware, such as solving systems of linear equations, Finite Element Method (FEM) and so on =-=[110]-=-. This methodology is called Network–on–Chip (NoC). Using NoC architecture allows the parallel processors to deal with irregular structure of the sparse matrices and achieve a high performance in FPGA... |

1 |
sparse Matrix-Vector Multiplication Based on Network-onChip in FPGA
- Sun, Jheng, et al.
- 2010
(Show Context)
Citation Context ...1) + βpk k ⇐ k + 1 end while 6.2.3 Basic Idea In order to accelerate the performance of the iterative solvers, a SMVM platform was built based on the NoC architecture according to the earlier reports =-=[110, 111]-=-, which contains p × p PEs (p = 2, 4, 8, . . . , 2k k ∈ N). It is connected by a 2–D mesh network. The packet forwarding function is used to transmit the data. For instance, Figure 6.2 shows a simple ... |

1 | A Configurable IP Core for Inverse Quantized Discrete Cosine and Integer Transform with Arbitrary Accuracy - Sun, Zhang, et al. - 2010 |