Results 1  10
of
90
Solution of Partial Differential Equations on Vector Computers
 Proc. 1977 Army Numerical Analysis and Computers Conference
, 1977
"... In this paper we review the present status of numerical methods for partial differential equations on vector and parallel computers. A discussion of the relevant aspects of these computers and a brief review of their development is included, with particular attention paid to those characteristics t ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
In this paper we review the present status of numerical methods for partial differential equations on vector and parallel computers. A discussion of the relevant aspects of these computers and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selecUon. Both direct and iteraUve methods are given for elliptic equations as well as explicit and implicit methods for initialboundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restricUons or the lack of adequate algorithms. A brief discussion of application areas utilizing these computers is included.
CommunicationEfficient Parallel Algorithms for Distributed RandomAccess Machines
 Algorithmica
, 1988
"... This paper introduces a model for parallel computation, called the distributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
This paper introduces a model for parallel computation, called the distributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network. We introduce the notion of a conservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to "shortcut" pointers in a data structure so that remote processors can communicate without causing undue congestion. We give O(lg n)step, linearprocessor, linearspace, conservative algorithms for a variety of problems on n node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions ...
Efficient Hardware Data Mining with the Apriori Algorithm on FPGAs
"... The Apriori algorithm is a popular correlationbased datamining kernel. However, it is a computationally expensive algorithm and the running times can stretch up to days for large databases, as database sizes can extend to Gigabytes. Through the use of a new extension to the systolic array architect ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
The Apriori algorithm is a popular correlationbased datamining kernel. However, it is a computationally expensive algorithm and the running times can stretch up to days for large databases, as database sizes can extend to Gigabytes. Through the use of a new extension to the systolic array architecture, time required for processing can be significantly reduced. Our array architecture implementation on a Xilinx VirtexII Pro 100 provides a performance improvement that can be orders of magnitude faster than the stateoftheart software implementations. The system is easily scalable and introduces an efficient "systolic injection " method for intelligently reporting unpredictably generated midarray results to a controller without any chance of collision or excessive stalling.
A Model of Computation for VLSI with Related Complexity Results
 Journal of the ACM
, 1985
"... Abstract. A new model of computation for VLSI, based on the assumption that time for propagating information is at least linear in the distance, is proposed. While accommodating for basic laws of physics, the model is designed to be general and technology independent. Thus, from a complexity viewpoi ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Abstract. A new model of computation for VLSI, based on the assumption that time for propagating information is at least linear in the distance, is proposed. While accommodating for basic laws of physics, the model is designed to be general and technology independent. Thus, from a complexity viewpoint, it is especially suited for deriving lower bounds and tradeoffs. New results for a number of problems, including fanin, transitive functions, matrix multiplication, and sorting are presented. As regards upper bounds, it must be noted that, because of communication costs, the model clearly favors regular and pipelined architectures (e.g., systolic arrays).
Advanced Copy Propagation for Arrays
 In Languages, Compilers, and Tools for Embedded Systems LCTES’03
, 2003
"... The focus of this paper is on a data flowtransformation called advanced copy propagation. After an array is assigned, we can, under certain conditions, replace a read from this array by the right hand side of the assignment. If so, the intermediate assignment can be skipped. In case it becomes dead ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
The focus of this paper is on a data flowtransformation called advanced copy propagation. After an array is assigned, we can, under certain conditions, replace a read from this array by the right hand side of the assignment. If so, the intermediate assignment can be skipped. In case it becomes dead code, it can be eliminated. Where necessary we distinguish between the different elements of arrays as well as the different runtime instances of statements, allowing us to do propagation over global loop and condition scopes. We have formalized two basic operations: nonrecursive propagation that operates on two statements and recursive propagation that operates on one statement. A global algorithm uses these two operations to do propagation on code involving any number of statements. Running our prototype implementation on some multimedia kernels shows that we can get a decrease in memory acesses between 22% and 43%.
Parallel algorithms in linear algebra
 Computer Sciences Laboratory, ANU
, 1991
"... This paper provides an introduction to algorithms for fundamental linear algebra problems on various parallel computer architectures, with the emphasis on distributedmemory MIMD machines. To illustrate the basic concepts and key issues, we consider the problem of parallel solution of a nonsingular ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
This paper provides an introduction to algorithms for fundamental linear algebra problems on various parallel computer architectures, with the emphasis on distributedmemory MIMD machines. To illustrate the basic concepts and key issues, we consider the problem of parallel solution of a nonsingular linear system by Gaussian elimination with partial pivoting. This problem has come to be regarded as a benchmark for the performance of parallel machines. We consider its appropriateness as a benchmark, its communication requirements, and schemes for data distribution to facilitate communication and load balancing. In addition, we describe some parallel algorithms for orthogonal (QR) factorization and the singular value decomposition (SVD). 1. Introduction – Gaussian
VIP: An FPGAbased Processor for Image Processing and Neural Networks
, 1996
"... We present in this paper the architecture and imple mentation of the Virtual Image Processor (VIP) which is an SIMD multiprocessor build with large FPGAs. The SIMD architecture, together with a 2D torus connection topology, is well suited for image processing, pattern recognition and neural network ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We present in this paper the architecture and imple mentation of the Virtual Image Processor (VIP) which is an SIMD multiprocessor build with large FPGAs. The SIMD architecture, together with a 2D torus connection topology, is well suited for image processing, pattern recognition and neural network algorithms. The VIP board can be programmed online at the logic level, allowing optimal hardware dedication to any given algorithm.
Floating Point Fault Tolerance with Backward Error Assertions
 IEEE Trans. Computers
, 1995
"... ..."
Some LinearTime Algorithms for Systolic Arrays
, 2000
"... We survey some recent results on lineartime algorithms for systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree n over a finite field can be computed in time O(n) on a linear systolic array of O(n) cells; similarly for the GCD of two nbit binar ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
We survey some recent results on lineartime algorithms for systolic arrays. In particular, we show how the greatest common divisor (GCD) of two polynomials of degree n over a finite field can be computed in time O(n) on a linear systolic array of O(n) cells; similarly for the GCD of two nbit binary numbers. We show how n by n Toeplitz systems of linear equations can be solved in time O(n) on a linear array of O(n) cells, each of which has constant memory size (independent of n). Finally, we outline how a twodimensional square array of O(n) by O(n) cells can be used to solve (to working accuracy) the eigenvalue problem for a symmetric real n by n matrix in time O(nS(n)). Here S(n) is a slowly growing function of n; for practical purposes S(n) can be regarded as a constant. In addition to their theoretical interest, these results have potential applications in the areas of errorcorrecting codes, symbolic and algebraic computation, signal processing and image processing. For example, systolic GCD arrays for error correction have been implemented with the microprogrammable “PSC” chip.
Convergence of Iteration Systems
, 1993
"... An iteration system is a set of assignment statements whose computation proceeds in steps: at each step, an arbitrary subset of the statements is executed in parallel. The set of statements thus executed may differ at each step; however, it is required that each statement is executed infinitely ofte ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
An iteration system is a set of assignment statements whose computation proceeds in steps: at each step, an arbitrary subset of the statements is executed in parallel. The set of statements thus executed may differ at each step; however, it is required that each statement is executed infinitely often along the computation. The convergence of such systems (to a fixed point) is typically verified by showing that the value of a given variant function is decreased by each step that causes a state change. Such a proof requires an exponential number of cases (in the number of assignment statements) to be considered. In this paper, we present alternative methods for verifying the convergence of iteration systems. In most of these methods, upto a linear number of cases need to be considered. 1 Introduction Iteration systems are a useful abstraction for computational, physical and biological systems that involve "truly concurrent" events. In computing science, they can be used to represent se...