Results 1 - 10
of
116
The Tera computer system
- In International Conference on Supercomputing
, 1990
"... The Tera architecture was designed with several ma jor goals in mind. First, it needed to be suitable for very high speed implementations, i. e., admit a short clock period and be scalable to many processors. This ..."
Abstract
-
Cited by 351 (2 self)
- Add to MetaCart
The Tera architecture was designed with several ma jor goals in mind. First, it needed to be suitable for very high speed implementations, i. e., admit a short clock period and be scalable to many processors. This
Reconfigurable Architectures and General-Purpose Computing in the MOS VLSI Era
, 1996
"... The hallmark of general-purpose computing has been the capability to run almost any, large, functionally diverse computational task on a single hardware system. General-purpose computing platforms, to date, have largely been been built around one or more moderately coarse-grained, fixed processors. ..."
Abstract
-
Cited by 97 (6 self)
- Add to MetaCart
The hallmark of general-purpose computing has been the capability to run almost any, large, functionally diverse computational task on a single hardware system. General-purpose computing platforms, to date, have largely been been built around one or more moderately coarse-grained, fixed processors. As available silicon density increases, it is worthwhile to consider other computing structures for providing flexible computation. In this paper, we look specifically at both conventional processors and reconfigurable logic to understand their relative merits in general-purpose computing scenarios. A simple analysis of delivered functional capacity suggests that conventional processors are best suited for tasks which require a diverse set of operations which are well matched to the processor's ALU primitives and datapaths. Reconfigurable logic, on the other hand, can deliver higher capacity on a broader range of functionality and datapaths when the function required is highly repetitive. 1 ...
Billion-Transistor Architectures
, 1997
"... ns three articles, which appear in Cybersquare. Each describes one trend that will affect future microprocessor architectures. In the second category, each article makes the case for a different billion -transistor architecture. Although these articles represent the state of the art and the aut ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
ns three articles, which appear in Cybersquare. Each describes one trend that will affect future microprocessor architectures. In the second category, each article makes the case for a different billion -transistor architecture. Although these articles represent the state of the art and the authors' best guesses, the future is notoriously hard to predict in our breakneck-paced field. Technology trends are generally easier to predict than their effects, but trend estimates can be wildly inaccurate. Intel's 1989 prediction for 1996 processors underestimated performance by a factor of four. 1 Forecasting the effects of technology is even harder, as illustrated by several well-known quotes: . "Everything that can be invented has been invented." US Commissioner of Patents, 1899. . "I think there is a world market for about five computers. " Thomas J. Watson Sr., IBM founder, 1943. . "There is no reason for any individuals to have a computer in their home.
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract
-
Cited by 64 (11 self)
- Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compile-time heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamic-level scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graph-analysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
A Parallel Genetic Algorithm for the Set Partitioning Problem
, 1994
"... In this dissertation we report on our efforts to develop a parallel genetic algorithm and apply it to the solution of the set partitioning problem--a difficult combinatorial optimization problem used by many airlines as a mathematical model for flight crew scheduling. We developed a distributed stea ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
In this dissertation we report on our efforts to develop a parallel genetic algorithm and apply it to the solution of the set partitioning problem--a difficult combinatorial optimization problem used by many airlines as a mathematical model for flight crew scheduling. We developed a distributed steady-state genetic algorithm in conjunction with a specialized local search heuristic for solving the set partitioning problem. The genetic algorithm is based on an island model where multiple independent subpopulations each run a steady-state genetic algorithm on their own subpopulation and occasionally fit strings migrate between the subpopulations. Tests on forty real-world set partitioning problems were carried out on up to 128 nodes of an IBM SP1 parallel computer. We found that performance, as measured by the quality of the solution found and the iteration on which it was found, improved as additional subpopulations were added to the computation. With larger numbers of subpopulations the genetic algorithm was regularly able to find the optimal solution to problems having up to a few thousand integer variables. In two cases, high-quality integer feasible solutions were found for problems with 36,699 and 43,749 integer variables, respectively. A notable limitation we found was the difficulty solving problems with many constraints.
Users Guide to the PGAPack Parallel Genetic Algorithm Library
, 1996
"... Contents 0 Quick Start 1 I Getting Started 2 1 Introduction 3 2 Installation 4 2.1 Obtaining PGAPack : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.2 Requirements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.3 ..."
Abstract
-
Cited by 55 (4 self)
- Add to MetaCart
Contents 0 Quick Start 1 I Getting Started 2 1 Introduction 3 2 Installation 4 2.1 Obtaining PGAPack : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.2 Requirements : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.3 Structure of the Distribution Directory : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.4 Installation Instructions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.5 Installation Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.5.1 Sequential Installation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.5.2 Parallel Installation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.6 Mailing Lists, Web Pag
Alecsys and the AutonoMouse: Learning to Control a Real Robot by Distributed Classifier Systems
- Machine Learning
, 1995
"... Abstract. In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency eonstraints which are addressed by three main tools: parallelism, distributed architecture, and t ..."
Abstract
-
Cited by 41 (16 self)
- Add to MetaCart
Abstract. In this article we investigate the feasibility of using learning classifier systems as a tool for building adaptive control systems for real robots. Their use on real robots imposes efficiency eonstraints which are addressed by three main tools: parallelism, distributed architecture, and training. Parallelismis useful to speed up computation and to increase the flexibility of the learning system design. Distributed architecture helps in making it possible to deeompose the overall task into a set of simpler learning tasks. Finally, training provides guidance to the system while learning, shortening the number of cycles required to learn. These tools and the issues they raise are first studied in simulation, and theu the experience gained with simulations is used to implement the learning system on the real robot. Results have shown that with this approach it is possible to let the AutonoMouse, a small real robot, learn to approach a light source under a number of different noise and lesion conditions. Keywords: learning classifier systems, reinforcement learning, genetic algorithms, animat problem 1.
Algorithmic redistribution methods for block cyclic decompositions
- IEEE Trans. on PDS
, 1996
"... ii To my parents iii Acknowledgments The writer expresses gratitude and appreciation to the members of his disser-tation committee, Michael Berry, Charles Collins, Jack Dongarra, Mark Jones and David Walker for their encouragement and participation throughout my doctoral experience. Special apprecia ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
ii To my parents iii Acknowledgments The writer expresses gratitude and appreciation to the members of his disser-tation committee, Michael Berry, Charles Collins, Jack Dongarra, Mark Jones and David Walker for their encouragement and participation throughout my doctoral experience. Special appreciation is due to Professor Jack Dongarra, Chairman, who pro-vided sound guidance, support and appropriate commentaries during the course of my graduate study. I also would like to thank Yves Robert and R. Clint Whaley for many useful and instructive discussions on general parallel algorithms and message passing software libraries. Many valuable comments for improving the presentation of this document were received from L. Susan Blackford. Finally, I am grateful to the Department of Computer Science at the University ofTennessee for allowing me to do this doctoral research work here. A special debt of gratitude is owed to Joanne Martin, IBM POWERparallel Division, for awarding me an IBM Corporation Fellowship covering the tuition as well as a stipend for the 1994-96 academic years. This work was also supported

