Results 1 - 10
of
14
High-Performance Computing on a Honeycomb Architecture
- Proc. Int’l ACPC Parallel Computation Conf
, 1993
"... We explore time and space optimization problems involved in the mapping of parallel algorithms onto a honeycomb architecture. When a well-known mapping is used, mapped algorithms generally exhibit execution slow-down and require too large area. We design several optimization techniques and enhance t ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
We explore time and space optimization problems involved in the mapping of parallel algorithms onto a honeycomb architecture. When a well-known mapping is used, mapped algorithms generally exhibit execution slow-down and require too large area. We design several optimization techniques and enhance the mapping process. Experimental results show more than 50 % saving in processor resources and 30 % saving in execution time, on average. Since computing performances are improved, also the applicability of the honeycomb architecture is wider. Keywords: Hexagonally connected data-driven array, mapping algorithms, optimization methods, simulated annealing, treshold accepting. ? This work has been supported by Ministry of Science and Technology of the Republic of Slovenia under Grant Number J2-5092. This report will appear in the proceedings of the 2nd Int'l Conf. Austrian Center for Parallel Computation, Gmunden, Austria, October 4-6, 1993. (Lecture Notes of Computer Science 734) Technical...
A data-flow processor for real-time low-level image processing
- IEEE Custom Integrated Circuits Conference
, 1991
"... Achip featuring two coupled data- ow processors (DFPs) has been designed. It is to be mesh-connected into large processor arrays dedicated primary to image processing. Each processor operates on 25 MBytes/s data- ows and performs up to 50 million 8- or 16-bit arithmetic operations per second. The ch ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Achip featuring two coupled data- ow processors (DFPs) has been designed. It is to be mesh-connected into large processor arrays dedicated primary to image processing. Each processor operates on 25 MBytes/s data- ows and performs up to 50 million 8- or 16-bit arithmetic operations per second. The chip has been processed in a 1 m CMOS technology. It includes 160,000 transistors in a 84 mm 2 die size area, its clock is at 25 MHz and it is packaged in a 144-pin PGA package. Our approach is to perform computations on the y on a data- ow that comes from a digital video camera and to associate one physical operator to each involved in an algorithm. The set of available operators on the DFP has been de ned to cover as widely as possible the range of low-level image processing functions. 1.
A Functional Data-flow Architecture dedicated to Real-time Image Processing
, 1993
"... This paper presents a data-flow computer developed at ETCA and dedicated to real-time image processing. Two types of data-driven processing elements, dedicated respectively to low and mid-level processings are integrated in a regular 3D array. Its design relies on a close integration of the data-flo ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper presents a data-flow computer developed at ETCA and dedicated to real-time image processing. Two types of data-driven processing elements, dedicated respectively to low and mid-level processings are integrated in a regular 3D array. Its design relies on a close integration of the data-flow architecture principles and the functional programming concept. Image processing data-flow graphs, first expressed using a functional syntax are directly mapped onto the processor array. The programming environment includes a complete FP-specification to network configuration compilation stream along with a global operator database. An experimental system, including 1024 low-level custom data-flow processors (6 \Theta 25 MBytes/s, 50 million operations per second) and 12 T800 transputers , was built and several image processing algorithms were run in real time at digital video speed. Keyword codes: C.1.3; D.1.1; I.4.0 Keywords: Data Flow Architectures; Functional Programming; Real Time ...
Design and Implementation of a Declarative Data-Parallel Language
, 1994
"... This paper describes the language 8 1/2 , an embedding of data-parallelism in a declarative framework. ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper describes the language 8 1/2 , an embedding of data-parallelism in a declarative framework.
Improved Schemes for Mapping Arbitrary Algorithms Onto Processor Meshes
, 1995
"... We address the problem of efficient schemes for mapping arbitrary parallel algorithms onto distributed memory message passing multiprocessors with mesh topologies. We analyze a particular mapping scheme, find the reasons for its low efficiency, and show that mapped algorithms tend to be both wider a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We address the problem of efficient schemes for mapping arbitrary parallel algorithms onto distributed memory message passing multiprocessors with mesh topologies. We analyze a particular mapping scheme, find the reasons for its low efficiency, and show that mapped algorithms tend to be both wider and higher than necessary. As a result, they generally execute too slow while at the same time occupying an excessive number of processors. Two approaches to the improvement of the scheme are presented, one direct, and the other indirect. In the direct approach, we describe four nontrivial improvements of the scheme but also prove their NP-completeness. In contrast, in the indirect approach the original scheme is followed by a refinement procedure that incrementally improves the mapped algorithms. We describe four different heuristic refinement procedures. Experimental results show that the indirect approach offers a 51% saving in processor resources and, at the same time, a 36% saving in exe...
Fault tolerant mapping onto VLSI/WSI processor arrays
- Proc. 20th Euromicro Conf
, 1994
"... This paper deals with efficient methods for mapping arbitrary parallel algorithms onto faulty general purpose VLSI/WSI data-driven array. First, a brief overview of several architectural designs of the array is given. Next, three directions for the algorithmic improvement of a certain mapping scheme ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper deals with efficient methods for mapping arbitrary parallel algorithms onto faulty general purpose VLSI/WSI data-driven array. First, a brief overview of several architectural designs of the array is given. Next, three directions for the algorithmic improvement of a certain mapping scheme are presented. None of these directions takes into account the possibility of the defects in the array. Therefore, we present two methods which can be used to adapt any of the above algorithmic improvements for the case where defects are present in the array. In the first Map-onto-faulty-array method, faulty cells are taken into consideration during all the phases of the mapping/improvement process. In contrast, the second Map-and-correct method initially ignores faulty cells and takes care of them in the correction phase following the mapping/improvement process. Keywords: Mapping, fault-tolerance, processor array, heuristic algorithms, optimization. ? This work has been supported by Min...
Advances in the Dataflow Computational Model
, 1999
"... The dataflow program graph execution model, or dataflow for short, is an alternative to the stored- program (von Neumann) execution model. Because it relies on a graph representation of programs, the strengths of the dataflow model are very much the complements of those of the stored-program one. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The dataflow program graph execution model, or dataflow for short, is an alternative to the stored- program (von Neumann) execution model. Because it relies on a graph representation of programs, the strengths of the dataflow model are very much the complements of those of the stored-program one. In the last thirty or so years since it was proposed, the dataflow model of computation has been us ed and developed in very many areas of computing research: from programming languages to processor design,and from signal processing to reconfigurable computing. This paper is a review of the current state-of-the-art in the applications of the dataflow model of computation. It focuses on three areas: multithreaded computing, signal processing and reconfigurable computing.
Using Parallel Simulated Annealing in the Mapping Problem
- In Proc. PARLE 94
, 1994
"... This paper presents a parallel simulated annealing algorithm for solving the problem of mapping irregular parallel programs onto homogeneous processor arrays with regular topology. The algorithm constructs and uses joint transformations. These transformations guarantee a high degree of parallelism t ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents a parallel simulated annealing algorithm for solving the problem of mapping irregular parallel programs onto homogeneous processor arrays with regular topology. The algorithm constructs and uses joint transformations. These transformations guarantee a high degree of parallelism that is bounded below by d jNp j deg(Gp)+1 e, where jN p j is the number of task nodes in the mapped program graph G p and deg(G p ) is the maximal degree of a node in G p . The mapping algorithm provides good program mappings (in terms of program execution time and the number of processors used) in a reasonable number of steps. Keywords: Algorithm mapping, data-driven array, optimization, parallel simulated annealing. ? This work has been supported by Ministry of Science and Technology of the Republic of Slovenia under Grant Numbers J2-5092-106 and Z2-5509-106. This report will appear in the proceedings of the 6th International PARLE Conference, Athens, Greece, July 13-17, 1994. (Lectur...
Self Loop Pipelining and Reconfigurable Dataflow Arrays
- in Int’l Workshop on Systems, Architectures, MOdeling, and Simulation (SAMOS IV), Samos
, 2004
"... Abstract. This paper presents some interesting concepts of static dataflow machines that can be used by reconfigurable computing architectures. We introduce some data-driven reconfigurable arrays and summarize techniques to map imperative software programs to those architectures, some of them being ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. This paper presents some interesting concepts of static dataflow machines that can be used by reconfigurable computing architectures. We introduce some data-driven reconfigurable arrays and summarize techniques to map imperative software programs to those architectures, some of them being focus of current research work. In particular, we briefly present a novel technique for pipelining loops. Experiments with the technique confirm important improvements over the use of conventional loop pipelining. Hence, the technique proves to be an efficient approach to map loops to coarse-grained reconfigurable architectures employing a static dataflow computational model 1
Integrating Transputer Arrays within a Data-Flow Architecture: Applications in Real-time Image Processing
, 1994
"... . This paper presents a Data Flow Functional Computer (DFFC) developed at ETCA and dedicated to real-time image processing. One original feature of this computer lies in the integration both at the hardware and software level of two types of data-driven processing elements: 1024 custom Data Flow ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
. This paper presents a Data Flow Functional Computer (DFFC) developed at ETCA and dedicated to real-time image processing. One original feature of this computer lies in the integration both at the hardware and software level of two types of data-driven processing elements: 1024 custom Data Flow Processor (DFP) -- embedded in a 3D interconnected network and dedicated to low level processing and 36 T800 Transputers -- embedded in a 2D interconnected networks and dedicated to mid to high level processing. A unifying programming model is provided, based on a close integration of the data-flow architecture principles and the functional programming concepts. An image processing algorithm, expressed using an FP-like functional syntax is first converted into a Data-flow Graph (DFG). The nodes of this graph are real time operators implementable on the physical processors of the data-flow machine. This DFG is then physically mapped onto the network of processors. The programming en...

