10612 | Introduction to Algorithms.
- Cormen, Leiserson, et al.
- 1990
Citation Context ...Section 6scontains our conclusions.s2. Parallel Bordered-Diagonal-Block Sparse LU Factorizations2.1. Introduction to LU FactorizationsThis section presents an overview of the LU factorization problem =-=[4, 28, 29, 30]-=-.sSolving the following system of N linear equations is the core computation of manysengineering and scientific applicationssA*x = bs(1)swhere A is an N x N nonsingular matrix, x is a vector of N unkn... |

658 |
Direct Methods for Sparse Matrices
- Duff, Erisman, et al.
- 1986
Citation Context ...Section 6scontains our conclusions.s2. Parallel Bordered-Diagonal-Block Sparse LU Factorizations2.1. Introduction to LU FactorizationsThis section presents an overview of the LU factorization problem =-=[4, 28, 29, 30]-=-.sSolving the following system of N linear equations is the core computation of manysengineering and scientific applicationssA*x = bs(1)swhere A is an N x N nonsingular matrix, x is a vector of N unkn... |

194 | A Decade of Reconfigurable Computing: a Visionary Retrospective. In: - Hartenstein - 2001 |

133 |
Reconfigurable Computing: A Survey of Systems and
- Compton, Hauck
- 2002
Citation Context ...ardware functionality with the same hardware. FPGA-based (re)configurable systemsscan be used as specialized co-processors [16], processor-attached functional units orsindependent processing machines =-=[7]-=-, attached message routers in parallel machines [17],sgeneral-purpose processors for unconventional designs [17], and general-purpose [16, 20]sor specialized systems for parallel processing [12, 19]. ... |

95 | An asynchronous parallel supernodal algorithm for sparse Gaussian elimination.
- Gilbert, Demmel, et al.
- 1999
Citation Context ...etermines the matrix elements on the ith row, where isassumes all values in [1,N]:sijLs=s( ijA − kj j k ik UL * 1 1 ∑− = ) *sjjU 1sforsj ∈[1, i-1]s(4)sijUs= ijA − kji k ik UL * 1 1 ∑− =sfor j ∈[i, N]s=-=(5)-=-sObserving the structures of L and U, we can see that there is no need to store L, U, and Asseparately. We can use only one matrix A to store all three matrices. During thesfactorization, the modified... |

80 | Quantitative analysis of floating point arithmetic on FPGA based custom computing machines“, - Shirazi, Walters, et al. - 1995 |

45 | Evaluation of the Streams-C C-to-FPGA Compiler: An Applications Perspective
- Frigo, Gokhale, et al.
- 2001
Citation Context ...owever, current compilers often require manualshardware/software partitioning and optimization, and the quality of the result in areasrequirements and system clock frequency is not often satisfactory =-=[10, 11]-=-.sDynamically reconfigurable datapaths also can be implemented with FPGAs. Forsexample, a relatively simple co-processor for the acceleration of main computation loopssin compute intensive application... |

35 | WSMP: Watson Sparse Matrix Package Part I { direct solution of symmetric sparse systems Version 1.0.0 - Gupta - 2000 |

32 | E cient sparse LU factorization with partial pivoting on distributed memory architectures
- Fu, Jiao, et al.
- 1998
Citation Context ...or sharedmemory machines such as the Cray C90 and J90, and IBM SP machines [5]. Good resultssfor the S+ sparse LU solver have been obtained on distributed-memory machines such assthe Cray T3D and T3E =-=[6]-=-.sReal-time power flow analysis has many variations that are used frequently in theselectrical power utilities industry [32]. First, for such utilities to monitor the performancesof the network contin... |

32 |
Computer Analysis Methods for Power Systems,
- Heydt
- 1996
Citation Context ...Section 6scontains our conclusions.s2. Parallel Bordered-Diagonal-Block Sparse LU Factorizations2.1. Introduction to LU FactorizationsThis section presents an overview of the LU factorization problem =-=[4, 28, 29, 30]-=-.sSolving the following system of N linear equations is the core computation of manysengineering and scientific applicationssA*x = bs(1)swhere A is an N x N nonsingular matrix, x is a vector of N unkn... |

19 | Exploiting Operation Level Parallelism Through Dynamically Reconfigurable Datapaths,” Design Auto
- Huang, Malik
Citation Context ...s into FPGAs (often on the fly) as needed, the designer can achieve greatershardware functionality with the same hardware. FPGA-based (re)configurable systemsscan be used as specialized co-processors =-=[16]-=-, processor-attached functional units orsindependent processing machines [7], attached message routers in parallel machines [17],sgeneral-purpose processors for unconventional designs [17], and genera... |

15 |
High Performance Computing: Crays, Clusters, and Centers. What Next?”,
- Bell, Gray
- 2002
Citation Context ...stributed shared-memory multicomputers employing crossbar or multistagesinterconnection networks, and clusters of scalar uni- and multi-processor systemssdominate the high-performance computing field =-=[1, 24]-=-. These parallel computers havesaccomplished a great deal of success in solving computation-intensive problems.sHowever, their high price, their long design and development cycles, the difficulty ofss... |

15 | A Parallel Gauss-Seidel Algorithm for Sparse Power Systems Matrices
- Koester, Ranka, et al.
- 1994
Citation Context ... Alterasdevice is also presented. Our highly parallel LU factorization algorithm, namely thesbordered-diagonal-block sparse matrix solver for sparse matrices having unknown (i.e,snot fixed) structure =-=[2, 3]-=-, is very suitable for electrical power systems. Real electricalspower systems are represented by very large sparse matrices having unknown structure,sso we have adapted this algorithm for implementat... |

12 |
Reconfigurable Computing and
- Tessier, Burleson
- 2002
Citation Context ...sing, pattern recognition, etc. However, given the programmable nature ofsconfigurable devices, an ASIC implementation is generally faster by a factor of five tosten than its configurable counterpart =-=[8]-=-.sMost of the configurable parallel-machine implementations currently reside on multiFPGA systems interconnected via a specific network; ASIC components may also bespresent [7]. For example, Splash 2 ... |

11 | Migration in Single Chip Multiprocessors
- Shaw, Dally
- 2002
Citation Context ...utures25 as more designers attempt to use FPGAs for complex designs. Driven by expectedsadvances predicted by Moore’s Law, many researchers have recently focused on thesdesign of multiprocessor chips =-=[31]-=-. Such chips will often have to be prototyped onsFPGAs. Fourth, our design does not give very good performance because it does notsinclude a good FPU. The design of a very good superpipelined FPU is n... |

11 |
Parallel processing in power systems computation
- Tylavsky, Bose, et al.
- 1992
Citation Context ...ve been obtained on distributed-memory machines such assthe Cray T3D and T3E [6].sReal-time power flow analysis has many variations that are used frequently in theselectrical power utilities industry =-=[32]-=-. First, for such utilities to monitor the performancesof the network continuously in order to identify disturbances, such as power stationsfailures, broken lines, and line overcharge. Second, to spee... |

10 | A Universal, Dynamically Adaptable and Programmable Network Router for Parallel Computers - Golota, Ziavras |

10 |
Investigation of Various Mesh Architectures with Broadcast Buses for High Performance
- Ziavras
- 1999
Citation Context ...chines [7], attached message routers in parallel machines [17],sgeneral-purpose processors for unconventional designs [17], and general-purpose [16, 20]sor specialized systems for parallel processing =-=[12, 19]-=-. In the past decade, FPGA-basedsconfigurable computing machines have acquired significant attention for improving thesperformance of algorithms in several fields, such as DSP, data communication, gen... |

10 |
Scalable Multifolded Hypercubes for Versatile Parallel Computers
- Ziavras
- 1995
Citation Context ...r-attached functional units orsindependent processing machines [7], attached message routers in parallel machines [17],sgeneral-purpose processors for unconventional designs [17], and general-purpose =-=[16, 20]-=-sor specialized systems for parallel processing [12, 19]. In the past decade, FPGA-basedsconfigurable computing machines have acquired significant attention for improving thesperformance of algorithms... |

9 |
Parallel DSP Algorithms on TurboNet: An Experimental Hybrid Message-Passing/Shared-Memory Architecture
- Li, Ziavras, et al.
- 1996
Citation Context ...stributed shared-memory multicomputers employing crossbar or multistagesinterconnection networks, and clusters of scalar uni- and multi-processor systemssdominate the high-performance computing field =-=[1, 24]-=-. These parallel computers havesaccomplished a great deal of success in solving computation-intensive problems.sHowever, their high price, their long design and development cycles, the difficulty ofss... |

9 |
Diakoptic and Generalized Hybrid Analysis
- Chen
- 1976
Citation Context ...ows and/or columns of matrix A(k) at each stage k are orderedsin an effort to produce the minimum number of fill-ins. In our implementation, we usesminimum degree ordering and node tearing algorithms =-=[3, 4, 40]-=- in order to get a nearsoptimal BDB matrix.s2.3. Parallel LU Factorization of a BDB Sparse MatrixsIn our implementation, we use the BDB form for the matrix (see Figure 2) as our finalsform of ordering... |

8 | An Efficient Ordering Algorithm to Improve Sparse Vector Methods - Gomez, Franquello - 1988 |

7 |
Stream-oriented FPGA computing
- Gokhale, Stone, et al.
- 2000
Citation Context ...owever, current compilers often require manualshardware/software partitioning and optimization, and the quality of the result in areasrequirements and system clock frequency is not often satisfactory =-=[10, 11]-=-.sDynamically reconfigurable datapaths also can be implemented with FPGAs. Forsexample, a relatively simple co-processor for the acceleration of main computation loopssin compute intensive application... |

7 |
Efficient Mapping Algorithms for a Class of Hierarchical Systems
- Ziavras
- 1993
Citation Context ...e required hardware components onto FPGAsresources, but application algorithms also have to be modified and mapped appropriatelysto the chosen FPGA resources in ways that yield acceptable performance =-=[21]-=-. Due to thesdifficulty of dealing with low-level hardware design, research groups have developedshigh-level language compilers to effectively map C/C++ code into VHDL code forstargeted FPGAs [7-11, 2... |

6 |
Parallel LU Factorization of BlockDiagonal-Bordered Sparse Matrices
- Koester, Ranka, et al.
- 1994
Citation Context ... Alterasdevice is also presented. Our highly parallel LU factorization algorithm, namely thesbordered-diagonal-block sparse matrix solver for sparse matrices having unknown (i.e,snot fixed) structure =-=[2, 3]-=-, is very suitable for electrical power systems. Real electricalspower systems are represented by very large sparse matrices having unknown structure,sso we have adapted this algorithm for implementat... |

6 |
Splash 2: FPGAs
- Buell, Arnold, et al.
- 1996
Citation Context ...chines [7], attached message routers in parallel machines [17],sgeneral-purpose processors for unconventional designs [17], and general-purpose [16, 20]sor specialized systems for parallel processing =-=[12, 19]-=-. In the past decade, FPGA-basedsconfigurable computing machines have acquired significant attention for improving thesperformance of algorithms in several fields, such as DSP, data communication, gen... |

6 | Parallel Solution of Sparse Algebraic Equations - Lin, Ness - 1994 |

6 | A Message-Passing Distributed-Memory Parallel Power Flow Algorithm - Feng, Flueck |

5 | Dataflow Computation with Intelligent Memories Emulated on Field-Programmable Gate Arrays (FPGAs
- Ingersoll, Ziavras
- 2002
Citation Context ...based (re)configurable systemsscan be used as specialized co-processors [16], processor-attached functional units orsindependent processing machines [7], attached message routers in parallel machines =-=[17]-=-,sgeneral-purpose processors for unconventional designs [17], and general-purpose [16, 20]sor specialized systems for parallel processing [12, 19]. In the past decade, FPGA-basedsconfigurable computin... |

5 |
The Physiology of the Grid: An Open Services Architecture for Distributed Systems Integration,” Draft of work in progress, http://www.globus.org/research/papers/ogsa.pdf
- Foster, Kesselman, et al.
Citation Context ...eer computing [1]. To makesparallel computing available to the masses, all available Internet nodes in “gridscomputing” are candidates to solve large-scale problems in a distributed-computingsfashion =-=[22]-=-. However, these approaches to high-performance computing are not viablesfor systems dedicated to a single application or for low-budget solutions.sLU factorization is a direct method that can solve l... |

5 | Node Ordering Algorithms for Sparse Vector Method Improvement - Gomez, Franquello - 1988 |

4 |
Complexity of Matrix Partitioning Schemes for g-Inversion on the Connection Machine
- Krishnamurthy, Ziavras
- 1988
Citation Context ...sA*x = bs(1)swhere A is an N x N nonsingular matrix, x is a vector of N unknowns, and b is a givensvector of length N. The solvers for this equation come mainly in two forms: direct [4] andsiterative =-=[15]-=-.sOne of the classic direct methods is LU factorization, which works as follows. We firstsfactorize A so thatsA=L*Us(2)swhere L is a lower triangular matrix and U is an upper triangular matrix. Once t... |

4 | New Ordering Methods for Sparse Matrix Inversion via Diagonalization - Wang, Gooi - 1997 |

3 | VIP: An FPGA-Based - Cloutier, Cosatto, et al. - 1996 |