Results 1 - 10
of
10
Limits of instruction-level parallelism
, 1991
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Al ..."
Abstract
-
Cited by 339 (7 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Alto, the Network Systems
Complexity/Performance Tradeoffs with Non-Blocking Loads
, 1994
"... Non-blocking loads are a very effective technique for tolerating the cache-miss latency on data cache references. We describe several methods for implementing non-blocking loads. A range of resulting hardware complexity/performance tradeoffs are investigated using an object-code translation and inst ..."
Abstract
-
Cited by 85 (9 self)
- Add to MetaCart
Non-blocking loads are a very effective technique for tolerating the cache-miss latency on data cache references. We describe several methods for implementing non-blocking loads. A range of resulting hardware complexity/performance tradeoffs are investigated using an object-code translation and instrumentation system. We have investigated the SPEC92 benchmarks and have found that for the integer benchmarks, a simple hit-under-miss implementation achieves almost all of the available performance improvement for relatively little cost. However, for most of the numeric benchmarks, more expensive implementations are worthwhile. The results also point out the importance of using a compiler capable of scheduling load instructions for cache misses rather than cache hits in nonblocking systems. This Research Report is a preprint of a paper to appear at the 21st Annual International Symposium on Computer Architecture. d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, ...
How Useful Are Non-blocking Loads, Stream Buffers, and Speculative Execution in Multiple Issue Processors?
, 1994
"... We investigate the relative performance impact of non-blocking loads, stream buffers, and speculative execution both used individually and in conjunction with each other. We have simulated the SPEC92 benchmarks on a statically scheduled quad-issue processor model, running code from the Multiflow com ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
We investigate the relative performance impact of non-blocking loads, stream buffers, and speculative execution both used individually and in conjunction with each other. We have simulated the SPEC92 benchmarks on a statically scheduled quad-issue processor model, running code from the Multiflow compiler. Non-blocking loads and stream buffers both provide a significant performance advantage, and their combination performs significantly better than either alone. For example, with a 64-byte, 2-way set associative cache with 32 cycle fetch latency, non-blocking loads reduce the run-time by 21% while stream-buffers reduce it by 26%, and the combined use of the two yields a 47% reduction. The addition of speculative execution further improves the performance of the systems that we have simulated, with or without non-blocking loads and stream buffers, by an additional 20% to 40%. We expect that the use of all three of these techniques will be important in future generations of microprocessor...
Experience with a Wireless World Wide Web Client
, 1994
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There are two other research laboratories located in Pal ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There are two other research laboratories located in Palo Alto, the Network Systems
Recursive Layout Generation
- WRL Research Report 95/2
, 1995
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There are two other research laboratories located in Pal ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There are two other research laboratories located in Palo Alto, the Network Systems
A 300MHz 115W 32b Bipolar ECL Microprocessor
, 1993
"... A full-custom single-chip bipolar ECL RISC microprocessor was implemented in a 1.0m single-poly bipolar technology. This research prototype contains a CPU and on-chip 2KB instruction and 2KB data caches. Worst-case power dissipation with a nominal-5.2V supply is 115W. The chip has been designed for ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
A full-custom single-chip bipolar ECL RISC microprocessor was implemented in a 1.0m single-poly bipolar technology. This research prototype contains a CPU and on-chip 2KB instruction and 2KB data caches. Worst-case power dissipation with a nominal-5.2V supply is 115W. The chip has been designed for a worst-case clock frequency of 275MHz at a nominal supply. The chip verifies a new style of CAD tools developed during the design process, advanced packaging techniques for high-power microprocessors, and VLSI ECL circuit techniques. This Research Report is a reprint of a paper appearing in the November 1993 issue of the IEEE Journal of Solid-State Circuits. d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA ii Table of Contents 1. Introduction 1 2. Chip Overview 2 3. Bipolar Process Technology 5 4. Circuit Technology 8 4.1. Noise Margins 9 4.2. Clock Distribution 11 4.3. RAM Cell 12 4.4. Biases 12 4.5. Testing 12 5. CAD 13 5.1. Design Capt...
Drip: A Schematic Drawing Interpreter
- WRL Research Report 95/1
, 1995
"... This paper presents a design capture system in which schematics are translated into a procedural netlist specification language. The circuit designer draws schematics with a standard structured graphics editor that knows nothing about netlists or schematics. The translator program analyzes the struc ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper presents a design capture system in which schematics are translated into a procedural netlist specification language. The circuit designer draws schematics with a standard structured graphics editor that knows nothing about netlists or schematics. The translator program analyzes the structured graphics output file and translates it into a procedural netlist specification. d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA ii Table of Contents 1. Introduction 1 2. Basics 2 2.1. Simple Example 2 2.2. Structured Graphics 3 3. Generating Procedures 4 3.1. Frames and Evaluation 4 3.2. 2D Ordering 5 4. Drawing Interpretation 7 4.1. Icons 8 5. Analysis of Non-Evaluation Objects 9 5.1. Binding Text to Objects 9 5.2. Wires 10 5.3. Wire Subscripting 11 6. Error Reporting 11 7. Experiences 12 Acknowledgements 12 References 12 iii iv List of Figures Figure 1: Code Generated for "CELL: orN" 2 Figure 2: 2D ordering of objects 5 Figur...
Performance implications of multiple pointer sizes
- IN: USENIX WINTER
, 1995
"... ... This paper analyzes several programs and pro-gramming techniques to understand the performance implications of different pointer sizes. Many (but not all) programs show small but definite performance consequences, primarily due to cache and paging effects. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
... This paper analyzes several programs and pro-gramming techniques to understand the performance implications of different pointer sizes. Many (but not all) programs show small but definite performance consequences, primarily due to cache and paging effects.
Piecewise Linear Models for Rsim
, 1993
"... Rsim is a switch-level simulator which can simulate large digital MOS integrated circuits with speedups of over 3 orders of magnitude over SPICE. Unfortunately, Rsim's simple switched-resistor model renders it incapable of simulating certain CMOS and most BiCMOS and ECL digital circuits. We obser ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Rsim is a switch-level simulator which can simulate large digital MOS integrated circuits with speedups of over 3 orders of magnitude over SPICE. Unfortunately, Rsim's simple switched-resistor model renders it incapable of simulating certain CMOS and most BiCMOS and ECL digital circuits. We observe that the switched-resistor model is just one particular piecewise linear model and that Rsim's simulation framework can accommodate more elaborate piecewise linear models. The resulting simulator, Mom, combines the efficiency of switch-level simulation with the ability to simulate a wider variety of circuits. We demonstrate Mom's efficiency and flexibility on a variety of circuits. This research was supported in part by DARPA contract N00039-91-C-1038. d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA 1 Introduction The high cost of semiconductor processing makes it desirable to verify the correctness of a large custom digital integr...
Circuit and Process Directions for Low-Voltage Swing Submicron BiCMOS
, 1994
"... Low-swing (<600mV) submicron BiCMOS circuits have many advantages over fullswing BiCMOS, CMOS, or small-swing bipolar circuits. We show that the optimal speed fan-in for low-swing BiCMOS logic circuits is generally in the range of 7 to 20, depending on the process characteristics and gate topology. ..."
Abstract
- Add to MetaCart
Low-swing (<600mV) submicron BiCMOS circuits have many advantages over fullswing BiCMOS, CMOS, or small-swing bipolar circuits. We show that the optimal speed fan-in for low-swing BiCMOS logic circuits is generally in the range of 7 to 20, depending on the process characteristics and gate topology. This high fan-in means that the bipolar device parasitic capacitances primarily determine the circuit speed and speedpower products, instead of f as in the case of low fan-in mux/demux communication T circuits. SiGe HBT BiCMOS circuits are attractive for logic circuits not primarily for their higher f , but rather for their increased maximum device currents for a given T parasitic capacitance and for their smaller V , which can lower chip power dissipation. be Finally, for small-swing BiCMOS circuits to be competitive with CMOS they must also be built from the same lithography as CMOS circuits, have local interconnect for interdevice intra-gate wiring, and be built with a full-custom d...

