Results 1 - 10
of
75
The Landscape of Parallel Computing Research: A View from Berkeley
- TECHNICAL REPORT, UC BERKELEY
, 2006
"... All rights reserved. ..."
A Network on Chip Architecture and Design Methodology
- In IEEE Computer Society Annual Symposium on VLSI
, 2002
"... We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, which we call Network-on-Chip (NOC), includes both the architecture and the design methodology. The NOC architecture is a m × n mesh of switches and resou ..."
Abstract
-
Cited by 108 (16 self)
- Add to MetaCart
We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, which we call Network-on-Chip (NOC), includes both the architecture and the design methodology. The NOC architecture is a m × n mesh of switches and resources are placed on the slots formed by the switches. We assume a direct layout of the 2-D mesh of switches and resources providing physical- architectural level design integration. Each switch is connected to one resource and four neighboring switches, and each resource is connected to one switch. A resource can be a processor core, memory, an FPGA, a custom hardware block or any other intellectual property (IP) block, which fits into the available slot and complies with the interface of the NOC. The NOC architecture essentially is the onchip communication infrastructure comprising the physical layer, the data link layer and the network layer of the OSI protocol stack. We define the concept of a region, which occupies an area of any number of resources and switches. This concept allows the NOC to accommodate large resources such as large memory banks, FPGA areas, or special purpose computation resources such as high performance multi-processors. The NOC design methodology consists of two phases. In the first phase a concrete architecture is derived from the general NOC template. The concrete architecture defines the number of switches and shape of the network, the kind and shape of regions and the number and kind of resources. The second phase maps the application onto the concrete architecture to form a concrete product. 1.
3-D ICs: A Novel Chip Design for Improving Deep-Submicrometer Interconnect Performance and Systems-on-Chip Integration
- Proceedings of the IEEE
, 2001
"... This paper analyzes the limitations of the existing interconnect technologies and design methodologies and presents a novel three-dimensional (3-D) chip design strategy that exploits the vertical dimension to alleviate the interconnect related problems and to facilitate heterogeneous integration of ..."
Abstract
-
Cited by 78 (5 self)
- Add to MetaCart
This paper analyzes the limitations of the existing interconnect technologies and design methodologies and presents a novel three-dimensional (3-D) chip design strategy that exploits the vertical dimension to alleviate the interconnect related problems and to facilitate heterogeneous integration of technologies to realize a system-on-a-chip (SoC) design. A comprehensive analytical treatment of these 3-D ICs has been presented and it has been shown that by simply dividing a planar chip into separate blocks, each occupying a separate physical level interconnected by short and vertical interlayer interconnects (VILICs), significant improvement in performance and reduction in wire-limited chip area can be achieved, without the aid of any other circuit or design innovations. A scheme to optimize the interconnect distribution among different interconnect tiers is presented and the effect of transferring the repeaters to upper Si layers has been quantified in this analysis for a two-layer 3-D
Reconfigurable Computing for Digital Signal Processing: A Survey
- Journal of VLSI Signal Processing
, 2000
"... Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of ch ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of choice for many DSP applications, increasingly new system implementations based on reconfigurable computing are being considered. These flexible platforms, which offer the functional efficiency of hardware and the programmability of software, are quickly maturing as the logic capacity of programmable devices follow Moore's Law and advanced automated design techniques become available. As initial reconfigurable technologies have emerged, new academic and commercial efforts have been initiated to support power optimization, cost reduction, and enhanced run-time performance. This paper presents a survey of academic research and commercial development in reconfigurable computing for DSP systems o...
A Methodology for Correct-by-Construction Latency Insensitive Design
- In Proc. Intl. Conf. on Computer-Aided Design
, 2003
"... In Deep Sub-Micron (DSM) designs, performance will depend critically on the latency of long wires. We propose a new synthesis methodology for synchronous systems that makes the design functionally insensitive to the latency of long wires. Given a synchronous specification of a design, we generate a ..."
Abstract
-
Cited by 40 (8 self)
- Add to MetaCart
In Deep Sub-Micron (DSM) designs, performance will depend critically on the latency of long wires. We propose a new synthesis methodology for synchronous systems that makes the design functionally insensitive to the latency of long wires. Given a synchronous specification of a design, we generate a functionally equivalent synchronous implementation that can tolerate arbitrary communication latency between latches. By using latches we can break a long wire in short segments which can be traversed while meeting a single clock cycle constraint. The overall goal is to obtain a design that is robust with respect to delays of long wires, in a shorter time by reducing the multiple iterations between logical and physical design, and with performance that is optimized with respect to the speed of the single components of the design. In this paper we describe the details of the proposed methodology as well as report on the latency insensitive design of PDLX , an out-of-order microprocessor with speculative-execution.
Impact of Spatial Intrachip Gate Length Variability on the Performance of High-Speed Digital Circuits
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, 2002
"... In this paper we address both empirically and theoretically the impact of an advanced manufacturing phenomenon on the performance of high-speed digital circuits. Using data collected from an actual state-of-the-art fabrication facility, we conducted a comprehensive characterization of an advanced 0. ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
In this paper we address both empirically and theoretically the impact of an advanced manufacturing phenomenon on the performance of high-speed digital circuits. Using data collected from an actual state-of-the-art fabrication facility, we conducted a comprehensive characterization of an advanced 0.18- m CMOS process. The measured data revealed a significant systematic, rather than random spatial intrachip variability of MOS gate length, leading to large circuit path delay variation. The delay of the critical path of a combinational logic block varies by as much as 17%, and the global skew is increased by 8%. Thus, a significant timing error and performance loss takes place if variability is not properly addressed. We derive a model, which allows estimating performance degradation for the given circuit and process parameters. We demonstrate explicitly that intrachip gate variation has a significant detrimental impact on the overall circuit performance, shifting the entire distribution of clock frequencies toward slower values. This is in striking contrast to the impact of interchip gate variation, traditionally considered in statistical circuit analysis, which leads to the variation of chip clock frequencies around the average value. Moreover, analysis shows that the spatial, rather than proximity-dependent systematic gate variability, is the main cause of large circuit speed degradation. The degradation is worse for the circuits with a larger number of critical paths and shorter average logic depth. We propose a location -dependent timing analysis methodology that allows mitigation of the detrimental effects of gate variability and have developed a tool linking the layout-dependent spatial information to circuit analysis. We discuss the details of practical implementat...
Getting to the Bottom of Deep Submicron II: A Global Wiring Paradigm
- In Proceedings of the 1999 International Symposium on Physical Design
, 1999
"... Global interconnect is commonly regarded as a key potential bottleneck to the advancing performance of high-speed integrated circuits. Previous work has suggested that local interconnect effects can be managed through a deep submicron design hierarchy that uses 50,000 to 100,000 gate modules as prim ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
Global interconnect is commonly regarded as a key potential bottleneck to the advancing performance of high-speed integrated circuits. Previous work has suggested that local interconnect effects can be managed through a deep submicron design hierarchy that uses 50,000 to 100,000 gate modules as primitive building blocks. This work aims to examine interconnect at the global level to determine if there are any significant roadblocks which will prevent National Technology Roadmap for Semiconductors expectations regarding clock speed from being met. Specifically, the issues of global RC delay, signal time-offlight, inductance, clock and power distribution, and noise are studied. Results indicate that, while global clock frequencies will necessarily be lower than local clock speeds, NTRS expectations should be attainable to the 50-nm technology generation. Achieving these high clock speeds (10 GHz local clock) will be aided by the use of a newly proposed routing hierarchy which limits inte...
Performance Driven Multi-level and Multiway Partitioning with Retiming
- IN PROC. DESIGN AUTOMATION CONF
, 2000
"... In this paper, we study the performance driven multiway circuit partitioning problem with consideration of the significant difference of local and global interconnect delay induced by the partitioning. We develop an efficient algorithm HPM (Hierarchical Performance driven Multi-level partitioning) t ..."
Abstract
-
Cited by 26 (13 self)
- Add to MetaCart
In this paper, we study the performance driven multiway circuit partitioning problem with consideration of the significant difference of local and global interconnect delay induced by the partitioning. We develop an efficient algorithm HPM (Hierarchical Performance driven Multi-level partitioning) that simultaneously considers cutsize and delay minimization with retiming. HPM builds a multi-level cluster hierarchy and performs various refinement while gradually decomposing the clusters for simultaneous cutsize and delay minimization. We provide comprehensive experimental justification for each step involved in HPM and in-depth analysis of cutsize and delay tradeoff existing in the performance driven partitioning problem. HPM obtains (i) 7% to 23% better delay compared to the state-of-the-art cutsize driven hMetis [11] at the expense of 19% increase in cutsize, and (ii) 81% better cutsize compared to the state-of-the-art delay driven PRIME [2] at the expense of 6% increase in delay.
Energy-Efficient Signal Processing via Algorithmic Noise-Tolerance
, 1999
"... In this paper, we propose a framework for low-energy digital signal processing (DSP) where the supply voltage is scaled beyond the critical voltage required to match the critical path delay to the throughput. This deliberate introduction of input-dependent errors leads to degradation in the algorith ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
In this paper, we propose a framework for low-energy digital signal processing (DSP) where the supply voltage is scaled beyond the critical voltage required to match the critical path delay to the throughput. This deliberate introduction of input-dependent errors leads to degradation in the algorithmic performance, which is compensated for via algorithmic noise-tolerance (ANT) schemes. The resulting setup that comprises of the DSP architecture operating at sub-critical voltage and the error control scheme is referred to as soft DSP. It is shown that technology scaling renders the proposed scheme more effective as the delay penalty suffered due to voltage scaling reduces due to short channel effects. The effectiveness of the proposed scheme is also enhanced when arithmetic units with a higher "delay-imbalance" are employed. A prediction based error-control scheme is proposed to enhance the performance of the filtering algorithm in presence of errors due to soft computations. For a frequ...
System-Level Performance Modeling with BACPAC - Berkeley Advanced Chip Performance Calculator
- Berkeley Advanced Chip Performance Calculator,” Proc. SLIP
, 1999
"... In this paper, a new system-level performance model is introduced. The model, called BACPAC (Berkeley Advanced Chip Performance Calculator), is comprised of a large set of analytical models that have been newly developed and/or compiled to collectively describe the performance characteristics of fut ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
In this paper, a new system-level performance model is introduced. The model, called BACPAC (Berkeley Advanced Chip Performance Calculator), is comprised of a large set of analytical models that have been newly developed and/or compiled to collectively describe the performance characteristics of future high-performance designs. New deep submicron effects are considered in BACPAC that were ignored in previous system-level performance models, such as noise and enhanced power modeling. BACPAC users can explore the capabilities of future VLSI designs given certain expected input conditions. The model can be used to predict clock frequency, chip area, power consumption, yield and other important characteristics of future systems.

