Results 1 - 10
of
15
Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System
- IEEE Transactions on Computers
, 1997
"... The known fast sequential algorithms for multiplying two N \Theta N matrices (over an arbitrary ring) have time complexity O(N ff ), where 2 ! ff ! 3. The current best value of ff is less than 2.3755. We show that for all 1 p N ff , multiplying two N \Theta N matrices can be performed on a p- ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
The known fast sequential algorithms for multiplying two N \Theta N matrices (over an arbitrary ring) have time complexity O(N ff ), where 2 ! ff ! 3. The current best value of ff is less than 2.3755. We show that for all 1 p N ff , multiplying two N \Theta N matrices can be performed on a p-processor linear array with a reconfigurable pipelined bus system (LARPBS) in O ` N ff p + ` N 2 p 2=ff ' log p ' time. This is currently the fastest parallelization of the best known sequential matrix multiplication algorithm on a distributed memory parallel system. In particular, for all 1 p N 2:3755 , multiplying two N \Theta N matrices can be performed on a p-processor LARPBS in O ` N 2:3755 p + ` N 2 p 0:8419 ' log p ' time, and linear speedup can be achieved for p as large as O(N 2:3755 =(log N) 6:3262 ). Furthermore, multiplying two N \ThetaN matrices can be performed on an LARPBS with O(N ff ) processors in O(log N) time. This compares favorably with...
Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system
- IEEE Trans. on Parallel and Distributed Systems
, 1998
"... Abstract—We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many i ..."
Abstract
-
Cited by 18 (9 self)
- Add to MetaCart
Abstract—We present efficient parallel matrix multiplication algorithms for linear arrays with reconfigurable pipelined bus systems (LARPBS). Such systems are able to support a large volume of parallel communication of various patterns in constant time. An LARPBS can also be reconfigured into many independent subsystems and, thus, is able to support parallel implementations of divide-and-conquer computations like Strassen’s algorithm. The main contributions of the paper are as follows: We develop five matrix multiplication algorithms with varying degrees of parallelism on the LARPBS computing model, namely, MM1, MM2, MM3, and compound algorithms &1 (�) and &2 (δ). Algorithm &1 (�) has adjustable time complexity in sublinear level. Algorithm &2 (δ) implies that it is feasible to achieve sublogarithmic time using o(N 3) processors for matrix multiplication on a realistic system. Algorithms MM3, &1 (�), and &2 (δ) all have o(N 3) cost and, hence, are very processor efficient. Algorithms MM1, MM3, and &1 (�) are general-purpose matrix multiplication algorithms, where the array elements are in any ring. Algorithms MM2 and &2 (δ) are applicable to array elements that are integers of bounded magnitude, or floating-point values of bounded precision and magnitude, or Boolean values. Extension of algorithms MM2 and &2 (δ) to unbounded integers and reals are also discussed.
Efficient Deterministic and Probabilistic Simulations of PRAMs on Linear Arrays with Reconfigurable Pipelined Bus Systems
- Journal of Supercomputing
, 2000
"... . In this paper, we present deterministic and probabilistic methods for simulating PRAM computations on linear arrays with reconfigurable pipelined bus systems (LARPBS). The following results are established in this paper. (1) Each step of a p-processor PRAM with m = O#p# shared memory cells can b ..."
Abstract
-
Cited by 13 (10 self)
- Add to MetaCart
. In this paper, we present deterministic and probabilistic methods for simulating PRAM computations on linear arrays with reconfigurable pipelined bus systems (LARPBS). The following results are established in this paper. (1) Each step of a p-processor PRAM with m = O#p# shared memory cells can be simulated by a p-processors LARPBS in O#log p# time, where the constant in the big-O notation is small. (2) Each step of a p-processor PRAM with m = ##p# shared memory cells can be simulated by a p-processors LARPBS in O#log m# time. (3) Each step of a p-processor PRAM can be simulated by a p-processor LARPBS in O#log p# time with probability larger than 1 - 1/p c for all c>0. (4) As an interesting byproduct, we show that a p-processor LARPBS can sort p items in O#log p# time, with a small constant hidden in the big-O notation. Our results indicate that an LARPBS can simulate a PRAM very efficiently. Keywords: Concurrent read, concurrent write, deterministic simulation, linear array...
Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers
- Journal of Parallel and Distributed Computing
, 2001
"... 1 ..."
Integer Sorting and Routing in Arrays with Reconfigurable Optical Buses
, 1996
"... In this paper we present deterministic algorithms for integer sorting and on-line packet routing on arrays with reconfigurable optical buses. The main objective is to identify the mechanisms specific to this type of architectures that allow us to build efficient integer sorting, partial permutation ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper we present deterministic algorithms for integer sorting and on-line packet routing on arrays with reconfigurable optical buses. The main objective is to identify the mechanisms specific to this type of architectures that allow us to build efficient integer sorting, partial permutation routing and h-relations algorithms. The consequences of these results on the PRAM simulation complexity are also investigated. Keywords: Optical pipelined buses, reconfigurable array, sorting, routing. 1. Introduction In large-scale general purpose parallel machines based on connection networks, efficient communication capabilities are essential in order to solve most of the problems of interest in a timely manner. Interprocessor communication networks are often the main bottlenecks in parallel machines. One important limitation of these networks concerns the exclusive access to the bus resources, which limits throughput to a function of the end-to-end propagation time. Optical communicati...
Optimally scaling permutation routing on reconfigurable linear arrays with optical buses
- In Second Merged Symposium IPPS/SPDP, 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing
, 2000
"... We present an optimal and scalable permutation routing algorithm for three reconfigurable models based on linear arrays that allow pipelining of information through an optical bus. Specifically, for any P N, our algorithm routes any permutation of N elements on a P-processor model optimally in O ( N ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present an optimal and scalable permutation routing algorithm for three reconfigurable models based on linear arrays that allow pipelining of information through an optical bus. Specifically, for any P N, our algorithm routes any permutation of N elements on a P-processor model optimally in O ( N P) steps. This algorithm extends naturally to one for routing h-relations optimally in O(h) steps. We also establish the equivalence of the three models: linear array with a reconfigurable pipelined bus system,
Efficient Parallel Algorithms for Distance Maps of 2-D Binary Images Using an Optical Bus
- Model of LPB and LARPBS [11] Segment Switches on an LARPBS [11] 5. Model of LARPBS with Switch Connections [12] 6. Model of LAROB [1] Model of AROB [6] (a) Two-Dimensional Reconfigurable Network (b) Switch Configurations 8. Model of
, 2002
"... Computing a distance map (distance transform) is an operation that converts a two-dimensional (2-D) image consisting of black and white pixels to an image where each pixel has a value or a pair of coordinates that represents the distance to or location of the nearest black pixel. It is a basic opera ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Computing a distance map (distance transform) is an operation that converts a two-dimensional (2-D) image consisting of black and white pixels to an image where each pixel has a value or a pair of coordinates that represents the distance to or location of the nearest black pixel. It is a basic operation in image processing and computer vision fields, and is used for expanding, shrinking, thinning, segmentation, clustering, computing shape, object reconstruction, etc. This paper examines the possibility of implementing the problem of finding a distance map for an image efficiently using an optical bus. The computational model considered is the linear array with a reconfigurable pipelined bus system (LARPBS), which has been introduced recently based on current electronic and optical technologies. It is shown that the problem for an image can be implemented in (log log log ) bus cycles deterministically or in (log ) bus cycles with high probability on an LARPBS with processors. By high probability, we mean a probability of (1 ) for any constant 1. We also show that the problem can be solved in (log log ) bus cycles deterministically or in (1) bus cycles with high probability on an LARPBS with 3 processors. Scalability of the algorithms is also discussed briefly. The same problem can be solved using an LARPBS of processors in (( ) log log log ) time deterministically or in (( ) log ) time with high probability for any practical machine size of . For processor arrays with practical sizes, a bus cycle is roughly the time of an arithmetic operation. Hence, the algorithm compares favorably to the best known parallel algorithms for the same problem in the literature.
An Improved Randomized Selection Algorithm With an Experimental Study
- In Proc. The 2nd Workshop on Algorithm Engineering and Experiments (ALENEX00
, 2000
"... This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily giv ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper presents an efficient randomized high-level parallel algorithm for finding the median given a set of elements distributed across a parallel machine. In fact, our algorithm solves the general selection problem that requires the determination of the element of rank k, for an arbitrarily given integer k. Our general...
Sublogarithmic Deterministic Selection on Arrays with a Reconfigurable Optical Bus
- IEEE Trans. on Computers
, 2002
"... The Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) is a newly introduced parallel computational model, where processors are connected by a reconfigurable optical bus. In this paper, we show that the selection problem can be solved on the LARPBS model deterministically in O((]og l ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) is a newly introduced parallel computational model, where processors are connected by a reconfigurable optical bus. In this paper, we show that the selection problem can be solved on the LARPBS model deterministically in O((]og log N)2/]o ]o ]o N) time. To our best knowledge, this is the best deterministic selection algorithm on any model with a reconfigurable optical bus.
Reconfigurable architectures and algorithms: A research survey
- IJCSA
, 2009
"... Ever since the introduction of the Dynamically Reconfigurable Buses, the architecture gained a lot of popularity amongst the researchers and scientists for its high performance computing with general purpose processor used. It is a powerful model of computation in which communication pattern between ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Ever since the introduction of the Dynamically Reconfigurable Buses, the architecture gained a lot of popularity amongst the researchers and scientists for its high performance computing with general purpose processor used. It is a powerful model of computation in which communication pattern between the processors could be changed during the execution. Following the years several new architectures and efficient algorithms for these were proposed, and their implementation using FPGA’s have been shown. This paper presents a survey on the different architectures proposed, and few important algorithms presented for these specialized architectures over the period of last two decades. Keywords: PARBS, R-MESH, RN, LARPBS, Polymorphic Torus Network, AROB. 1.

