Results 1 -
4 of
4
The Warp Computer: Architecture, Implementation, and Performance
- IEEE Transactions on Computers
, 1987
"... The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes 10 cells, thus having a peak computation rate of 100 MFLOPS. The Warp ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes 10 cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a UN host system. Programs for Warp are written in a high-level language supported by an optimizing compiler.
Instruction replication for clustered microarchitectures
- Proceedings of the 36 th International Symposium on Microarchitecture
, 2003
"... This work presents a new compilation technique that uses instruction replication in order to reduce the number of communications executed on a clustered microarchitecture. For such architectures, the need to communicate values between clusters can result in a significant performance loss. Inter-clus ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This work presents a new compilation technique that uses instruction replication in order to reduce the number of communications executed on a clustered microarchitecture. For such architectures, the need to communicate values between clusters can result in a significant performance loss. Inter-cluster communications can be reduced by selectively replicating an appropriate set of instructions. However, instruction replication must be done carefully since it may also degrade performance due to the increased contention it can place on processor resources. The proposed scheme is built on top of a previously proposed state-of-the-art modulo scheduling algorithm that effectively reduces communications. Results show that the number of communications can decrease using replication, which results in significant speed-ups. IPC is increased by 25% on average for a 4-cluster microarchitecture and by as much as 70 % for selected programs. 1.
A Communication Architecture for Multiprocessor Networks
, 1989
"... The system described in this thesis explores the territory between the two classical multiprocessor families: shared memory and message-passing machines. Like shared memory systems, the proposed architecture presents the user a logically uniform address space shared by all processors. This programmi ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The system described in this thesis explores the territory between the two classical multiprocessor families: shared memory and message-passing machines. Like shared memory systems, the proposed architecture presents the user a logically uniform address space shared by all processors. This programming model is supported directly by dedicated communication hardware that is translating memory references into messages that are exchanged over a network of point to point channels. The key parts of this work are the communication system and its integration with contemporary processor and memory components to form a homogeneous, general-purpose multiprocessor. The communication system is based on an adaptive routing heuristic that is independent of the actual network topology. High priority was given to optimal use of the physical bandwidth even under heavy or saturated load conditions. The communication system can be extended in small, incremental upgrades and supports medium haul channels ...
Loop Optimization Techniques On Multi-Issue Architectures
, 1994
"... CONTENTS ACKNOWLEDGMENTS.................................................................................................. iii LIST OF TABLES ............................................................................................................. vi LIST OF FIGURES .......................... ..."
Abstract
- Add to MetaCart
CONTENTS ACKNOWLEDGMENTS.................................................................................................. iii LIST OF TABLES ............................................................................................................. vi LIST OF FIGURES .......................................................................................................... vii CHAPTER I INTRODUCTION ...............................................................................................................1 1 Scheduling....................................................................................................2 2 Methodology. ...............................................................................................5 3 Research Contributions ..............................................................................12 4 Thesis Organization ...................................................................................13 CHAPTER II INSTRUCTION

