Results 1 - 10
of
26
A Framework for Exploiting Task- and Data-Parallelism on Distributed Memory Multicomputers
- IEEE Transactions on Parallel and Distributed Systems
, 1997
"... offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler a ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications–the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster. A practical implementation of a task and data parallel scheme of execution for an application on a distributed memory multicomputer also involves data redistribution. This data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and data parallelism together can be significantly faster than its execution using data parallelism alone. This makes our proposed optimization practical and extremely useful.
A Convex Programming Approach for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers
, 1994
"... Compilers have focussed on the exploitation of one of functional or data parallelism in the past. The PARADIGM compiler project at the University of Illinois is among the #rst to incorporate techniques for simultaneous exploitation of both. The work in this paper describes the techniques used in the ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
Compilers have focussed on the exploitation of one of functional or data parallelism in the past. The PARADIGM compiler project at the University of Illinois is among the #rst to incorporate techniques for simultaneous exploitation of both. The work in this paper describes the techniques used in the PARADIGM compiler and analyzes the optimality of these techniques. It is the #rst of its kind to use realistic cost models and includes data transfer costs which all previous researchers have neglected. Preliminary results on the CM-5 show the e#cacy of our methods and the signi#cant advantages of using functional and data parallelism together for execution of real applications. 1. INTRODUCTION Distributed memory multicomputers such as the Intel Paragon, the IBM SP-1 and the Thinking Machines CM-5 o#er signi#cant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately,to extract all that computational power from these machines, users have to write e#...
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
A Fast Multilayer General Area Router for MCM Designs
, 1992
"... The objective of this research is to develop an efficient multilayer general area router as an alternative to the three-dimensional (30) maze router for solving the mul-tilayer MCM routing problem. Our router, named SLICE, is independent of net ordering, requires much shorter computation time, and u ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
The objective of this research is to develop an efficient multilayer general area router as an alternative to the three-dimensional (30) maze router for solving the mul-tilayer MCM routing problem. Our router, named SLICE, is independent of net ordering, requires much shorter computation time, and uses fewer vias. A key step in our router is to compute a maximum non-crossing bipartite matching, which is solved optimally in 0 (nlogn) time where n is the number of possible connections. We tested our router on a number of examples, including two MCM designs from MCC. The total wirelength used by SLICE is only a few percent away from the optimal. Compared with a 30 maze router, SLICE is four times faster and uses 28 % fewer vias. A more important feature is that SLICE works on only a "thin slice" of the two-layer routing grids at a time, while a 30 maze router works on the entire three dimensional routing grid. Therefore, SLICE can successfully produce solutions for large MCM routing examples where 30 maze routers fail due to insufficient memory.
Register binding and port assignment for multiplexer optimization
- in Proc. the Asia Pacific Design Automation Conference
, 2004
"... Abstract- Data path connection elements, such as multiplexers, consume a significant amount of area on a VLSI chip, especially for FPGA designs. Multiplexer optimization is a difficult problem because both register binding and port assignment to reduce total multiplexer connectivity during high-leve ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
Abstract- Data path connection elements, such as multiplexers, consume a significant amount of area on a VLSI chip, especially for FPGA designs. Multiplexer optimization is a difficult problem because both register binding and port assignment to reduce total multiplexer connectivity during high-level synthesis are NP-complete problems. In this paper, we first formulate a k-cofamily-based register binding algorithm targeting the multiplexer optimization problem. We then further reduce the multiplexer width through an efficient port assignment algorithm. Experimental results show that we are 44 % better overall than the left-edge register binding algorithm on the total usage of multiplexer inputs and 7% better than a bipartite graph-based algorithm. For large designs, we are able to achieve significantly better results consistently. After technology mapping, placement and routing for an FPGA architecture, it shows considerably positive impacts on chip area, delay and power consumption. I.
Algebras For Object-Oriented Query Languages
, 1993
"... Data Types New base types can be added to the EXTRA data model via the EXTRA abstract data type facility. To add a new ADT, the person responsible for adding the type begins by writing (and debugging) the code for the type in the E programming language. E is an extension of C++ [Stro86] that was dev ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Data Types New base types can be added to the EXTRA data model via the EXTRA abstract data type facility. To add a new ADT, the person responsible for adding the type begins by writing (and debugging) the code for the type in the E programming language. E is an extension of C++ [Stro86] that was developed as part of the EXODUS project. E serves as the implementation language for access methods and operators for systems developed using EXODUS. It is also the target language for the query compiler, and (most importantly for our purposes here) the language in which base type extensions will be defined. E extends C++ with a number of features to aid programmers in data- 89 base system programming, including "dbclasses" for persistent storage, class generators for implementing "generic" classes and functions, iterators for use as a control abstraction in writing set operations, and built-in class generators for typed files and variable-length arrays [Rich87]. Suppose that we wanted to add...
Processor Allocation and Scheduling of Macro Dataflow Graphs on Distributed Memory Multicomputers by the PARADIGM Compiler
- In Proceedings of the 1993 International Conference on Parallel Processing, volume II-Software
, 1993
"... : Functional or Control parallelism is an effective way to increase speedups in Multicomputers. Programs for these machines are represented by Macro Dataflow Graphs (MDGs) for the purpose of functional parallelism analysis and exploitation. Algorithms for allocation and scheduling of MDGs have been ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
: Functional or Control parallelism is an effective way to increase speedups in Multicomputers. Programs for these machines are represented by Macro Dataflow Graphs (MDGs) for the purpose of functional parallelism analysis and exploitation. Algorithms for allocation and scheduling of MDGs have been discussed along with some analysis of their optimality. These algorithms attempt to minimize the execution time of any given MDG through exploitation of functional parallelism. Our preliminary results show their effectiveness over naive algorithms. Keywords : Macro Dataflow Graphs, Distributed Memory Multicomputers, Allocation and Scheduling, Parallelizing Compilers, Optimization. 1 Introduction Distributed Memory Multicomputers offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, writing efficient software for them is an extremely laborious process for users. The PARADIGM compiler project at Illinois is aimed at devising a paral...
Visual Algorithm Simulation
, 2003
"... Understanding data structures and algorithms, both of which are abstract concepts, is an integral part of software engineering and elementary computer science education. However, people usually have difficulty in understanding abstract concepts and processes such as procedural encoding of algorithms ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Understanding data structures and algorithms, both of which are abstract concepts, is an integral part of software engineering and elementary computer science education. However, people usually have difficulty in understanding abstract concepts and processes such as procedural encoding of algorithms and data structures. One way to improve their understanding is to provide visualizations to make the abstract concepts more concrete. This thesis presents the design, implementation and evaluation for the Matrix application framework that occupies a unique niche between the following two domains. In the first domain, called algorithm animation, abstractions of the behavior of fundamental computer program operations are visualized. In the second domain, called algorithm simulation, the framework for exploring and understanding algorithms and data structures is exhibited. First, an overview and theoretical basis for the application framework is presented. Second, the different roles are defined and examined for realizing the idea of algorithm
A Framework for Exploiting Data and Functional Parallelism on Distributed Memory Multicomputers
, 1994
"... Recent research efforts have shown the benefits of integrating functional and data parallelism over using either pure data parallelism or pure functional parallelism. The work in this paper presents a theoretical framework for deciding on a good execution strategy for a given program based on the av ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Recent research efforts have shown the benefits of integrating functional and data parallelism over using either pure data parallelism or pure functional parallelism. The work in this paper presents a theoretical framework for deciding on a good execution strategy for a given program based on the available functional and data parallelism in the program. The framework is based on assumptions about the form of computation and communication cost functions for multicomputer systems. We present mathematical functions for these costs and show that these functions are realistic. The framework also requires specification of the available functional and data parallelism for a given problem. For this purpose, we have developed a graphical programming tool. Currently, we have tested our approach using three benchmark programs on the Thinking Machines CM-5 and Intel Paragon. Results presented show that the approach is very effective and can provide a two- to three-fold increase in speedups over ap...
On the k-Layer Planar Subset and Topological Via Minimization Problems
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, 1991
"... An important problem in performance-driven routing is the k -layer planar subset problem which is to choose a maximum (weighted) subset of nets such that each net in the subset can be routed in one of k "preferred" layers. Related to the k -layer planar subset problem is the k -layer topological via ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
An important problem in performance-driven routing is the k -layer planar subset problem which is to choose a maximum (weighted) subset of nets such that each net in the subset can be routed in one of k "preferred" layers. Related to the k -layer planar subset problem is the k -layer topological via-minimization problem which is to determine the topology of each net using k routing layers such that a minimum number of vias is used. For the case k = 2, the topological via minimization problem has been studied by CAD researchers for a long time because of its practical and theoretical importance. In this paper, we show that both the general k -layer planar subset problem and the k -layer topological via minimization problem are NP-complete. Moreover, we show that both problems can be solved in polynomial time when the routing regions are crossing channels. It can be shown that under a suitable assumption, all the channels for inter-block connections in the general cell design style are ...

