Results 1 - 10
of
21
The Case for a Single-Chip Multiprocessor
- IEEE Computer
, 1996
"... Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we look for new ways to use their capabilities effectively. This paper shows that in advanced technologies it is possible to ..."
Abstract
-
Cited by 326 (5 self)
- Add to MetaCart
Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we look for new ways to use their capabilities effectively. This paper shows that in advanced technologies it is possible to implement a single-chip multiproces-sor in the same area as a wide issue superscalar processor. We find that for applications with little parallelism the performance of the two microarchitectures is comparable. For applications with large amounts of parallelism at both the fine and coarse grained levels, the multiprocessor microarchitectnre outperforms the superscrdar architecture by a significant margin. Single-chip multiprocessor architectures have the advantage in that they offer localized imple-mentation of a high-clock rate processor for inherently sequential applications and low latency interprocessor communication for par-allel applications. 1
Power Minimization in IC Design: Principles and Applications
- ACM Transactions on Design Automation of Electronic Systems
, 1996
"... Low power has emerged as a principal theme in today's electronics industry. The need for low power has caused a major paradigm shift in which power dissipation is as important as performance and area. This article presents an in-depth survey of CAD methodologies and techniques for designing low powe ..."
Abstract
-
Cited by 136 (22 self)
- Add to MetaCart
Low power has emerged as a principal theme in today's electronics industry. The need for low power has caused a major paradigm shift in which power dissipation is as important as performance and area. This article presents an in-depth survey of CAD methodologies and techniques for designing low power digital CMOS circuits and systems and describes the many issues facing designers at architectural, logic and physical levels of design abstraction. It reviews some of the techniques and tools that have been proposed to overcome these difficulties and outlines the future challenges that must be met to design low power, high performance systems.
Power considerations in the design of the Alpha 21264 microprocessor
- In 35th Design Automation Conference
, 1998
"... Power dissipation is rapidly becoming a limiting factor in high performance microprocessor design due to ever increasing device counts and clock rates. The 21264 is a third generation Alpha microprocessor implementation, containing 15.2 million transistors and operating at 600 MHz. This paper descri ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
Power dissipation is rapidly becoming a limiting factor in high performance microprocessor design due to ever increasing device counts and clock rates. The 21264 is a third generation Alpha microprocessor implementation, containing 15.2 million transistors and operating at 600 MHz. This paper describes some of the techniques the Alpha design team utilized to help manage power dissipation. In addition, the electrical design of the power, ground, and clock networks is presented. 2.
Low-Power Encodings for Global Communication in CMOS VLSI
, 1997
"... Technology trends and especially portable applications are adding a third dimension (power) to the previously two-dimensional (speed, area) VLSI design space [30]. A large portion of power dissipation in high performance CMOS VLSI is due to the inherent difficulties in global communication at high r ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
Technology trends and especially portable applications are adding a third dimension (power) to the previously two-dimensional (speed, area) VLSI design space [30]. A large portion of power dissipation in high performance CMOS VLSI is due to the inherent difficulties in global communication at high rates and we propose several approaches to address the problem. These techniques can be generalized at different levels in the design process. Global communication typically involves driving large capacitive loads which inherently require significant power. However, by carefully choosing the data representation, or encoding, of these signals, the average and peak power dissipation can be minimized. Redundancy can be added in space (number of bus lines), time (number of cycles) and voltage (number of distinct amplitude levels). The proposed codes can be used on a class of terminated off-chip board-level buses with level signaling, or on tri-state on-chip buses with level or transition signalin...
Optimal Decoupling Capacitor Sizing and Placement for Standard Cell Layout Designs
- IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems
, 1995
"... With technology scaling, the trend for high performance integrated circuits is towards ever higher operating frequency, lower power supply voltages and higher power dissipation. ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
With technology scaling, the trend for high performance integrated circuits is towards ever higher operating frequency, lower power supply voltages and higher power dissipation.
A Study of Single-Chip Processor/Cache Organizations for Large Numbers of Transistors
- In Proceedings of the 21st Annual International Symposium on Computer Architecture
, 1994
"... This paper presents a trace-driven simulation-based study of a wide range of cache configurations and processor counts. This study was undertaken in an attempt to help answer the question of how best to allocate large numbers of transistors, a question that is rapidly increasing in importance as tra ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
This paper presents a trace-driven simulation-based study of a wide range of cache configurations and processor counts. This study was undertaken in an attempt to help answer the question of how best to allocate large numbers of transistors, a question that is rapidly increasing in importance as transistor densities continue to climb. At what point does continuing to increase the size of the on-chip first level cache cease to provide sufficient increases in hit rate and become prohibitively difficult to access in a single cycle? In order to compare different configurations, the concept of an Equivalent Cache Transistor is presented. Results indicate that the access time of the first-level data cache is more important than the size. In addition, it appears that once approximately 15 million transistors become available, a two processor configuration is preferable to a single processor with correspondingly larger caches. 1. Introduction Advances in VLSI technology are reducing the minim...
Comparing Computing Machines
, 1998
"... Reconfigurable computing devices are emerging as a viable alternative to fixed-function components and programmable processors. To expand our knowledge of the role and optimization of these devices, it is increasingly imperative for us to compare implementations of tasks and subroutines across this ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Reconfigurable computing devices are emerging as a viable alternative to fixed-function components and programmable processors. To expand our knowledge of the role and optimization of these devices, it is increasingly imperative for us to compare implementations of tasks and subroutines across this wide spectrum of implementation options. The fact that most processors, FPGAs, ASICs, and memories are fabricated in a uniform technology medium, CMOS VLSI, where area scaling is moderately well understood eases our comparison task. Nonetheless, the rapid pace of technology, limited device size selection, and economic artifacts complicate the picture. In this paper, we look at the task of comparing computing machines, reviewing normalization techniques and many important issues which arise during comparisons. This paper includes examples intended to underscore the methodology and comparison issues, but does not attempt to make definitive conclusions about the merits of the technology alterna...
Circuit Implementation of a 600 MHz Superscalar
- International Conference on Computer Design
, 1998
"... The circuit techniques used to implement a 600MHz, out-of-order, superscalar RISC Alpha microprocessor are described. Innovative logic and circuit design created a chip that attains 30+ SpecInt95 and 50+ SpecFP95, and supports a secondary cache bandwidth of 6.4GB/s. Microarchitectural techniques wer ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The circuit techniques used to implement a 600MHz, out-of-order, superscalar RISC Alpha microprocessor are described. Innovative logic and circuit design created a chip that attains 30+ SpecInt95 and 50+ SpecFP95, and supports a secondary cache bandwidth of 6.4GB/s. Microarchitectural techniques were used to optimize latencies and cycle time, while a variety of static and dynamic design methods balanced critical path delays against power consumption. The chip relies heavily on full custom design and layout to meet speed and area goals. An extensive CAD suite guaranteed the integrity of the design. 1.
Impact of Heterogeneity on DSM Performance
, 1999
"... This paper explores area#parallelism tradeo#s in the design of distributed shared-memory #DSM# multiprocessors built out of large single-chip computing nodes. In this context, area-e#ciencyarguments motivate a heterogeneousorganizationconsisting of few nodes with large caches designed for single- ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper explores area#parallelism tradeo#s in the design of distributed shared-memory #DSM# multiprocessors built out of large single-chip computing nodes. In this context, area-e#ciencyarguments motivate a heterogeneousorganizationconsisting of few nodes with large caches designed for single-thread parallelism, and a larger number of nodes with smaller caches designed for multi-thread parallelism. This paper quantitatively studies the performance of such organization for a set of homogeneous multiprocessor programs from the SPLASH-2 benchmark suite. These programs are mapped onto the heterogeneousprocessors without sourcecode modi#cations via static thread assignment policies.
VLSI Datapath Choices: Cell-Based Versus Full-Custom
, 1998
"... Traditionally, VLSI architects and designers have acknowledged the area, performance, and effort tradeoffs between cell-based and full-custom implementations of the same datapath function. However, few attempts have been made to characterize these tradeoffs in the context of contemporary fabrication ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Traditionally, VLSI architects and designers have acknowledged the area, performance, and effort tradeoffs between cell-based and full-custom implementations of the same datapath function. However, few attempts have been made to characterize these tradeoffs in the context of contemporary fabrication processes and area place and route tools. More importantly, few attempts have been made to determine how to enable cell-based implementations to approach the density and speed of full-custom designs. This work quantifies the limits of cell-based datapath implementations based on results derived from a detailed analysis of the density and performance tradeoffs in the implementation of two full-custom datapaths, the Integer Register-Read Datapath (IRRDP) and the 64-bit adder/subtracter (ADDSUB), employed in the multi-ALU Processor (MAP) chip. A cell-based implementation of the IRRDP is 1.64x larger than the full-custom original. The critical timing path for the cell-based implementation is 11...

