Results 1 - 10
of
11
Limits of instruction-level parallelism
, 1991
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Al ..."
Abstract
-
Cited by 339 (7 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Alto, the Network Systems
Tradeoffs in Two-Level On-Chip Caching
- In Proceedings of the 21st Annual International Symposium on Computer Architecture
, 1993
"... The performance of two-level on-chip caching is investigated for a range of technology and architecture assumptions. The area and access time of each level of cache is modeled in detail. The results indicate that for most workloads, twolevel cache configurations (with a set-associative second level) ..."
Abstract
-
Cited by 94 (4 self)
- Add to MetaCart
The performance of two-level on-chip caching is investigated for a range of technology and architecture assumptions. The area and access time of each level of cache is modeled in detail. The results indicate that for most workloads, twolevel cache configurations (with a set-associative second level) perform marginally better than single-level cache configurations that require the same chip area once the first-level cache sizes are 64KB or larger. Two-level configurations become even more important in systems with no off-chip cache and in systems in which the memory cells in the first-level caches are multiported and hence larger than those in the second-level cache. Finally, a new replacement policy called two-level exclusive caching is introduced. Two-level exclusive caching improves the performance of two-level caching organizations by increasing the effective associativity and capacity. d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA...
Systems for Late Code Modification
- WRL Research Report 91/5
, 1991
"... Modifying code after the compiler has generated it can be useful for both optimization and instrumentation. This paper compares the code modification systems of Mahler and pixie, and describes two new systems we have built that are hybrids of the two. This paper covers material presented at the CODE ..."
Abstract
-
Cited by 87 (5 self)
- Add to MetaCart
Modifying code after the compiler has generated it can be useful for both optimization and instrumentation. This paper compares the code modification systems of Mahler and pixie, and describes two new systems we have built that are hybrids of the two. This paper covers material presented at the CODE '91 International Workshop on Code Generation, Schloss Dagstuhl, Germany, May 20-24, 1991. i 1. Introduction Late code modification is the process of modifying the output of a compiler after the compiler has generated it. The reasons one might want to do this fall into two categories, optimization and instrumentation. Some forms of optimization must be performed on assembly-level or machinelevel code. The oldest is peephole optimization [11], which acts to tidy up code that a compiler has generated; it has since been generalized to include transformations on more machine-independent code [2,3]. Reordering of code to avoid pipeline stalls [4,7,18] is most often done after the code is gene...
Experience with a Software-Defined Machine Architecture
- Unreachable Procedures in Object-oriented WRL Research Report 91/10
, 1991
"... We built a system in which the compiler back end and the linker work together to present an abstract machine at a considerably higher level than the actual machine. The intermediate language translated by the back end is the target language of all high-level compilers and is also the only assembl ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
We built a system in which the compiler back end and the linker work together to present an abstract machine at a considerably higher level than the actual machine. The intermediate language translated by the back end is the target language of all high-level compilers and is also the only assembly language generally available. This lets us do intermodule register allocation, which would be harder if some of the code in the program had come from a traditional assembler, out of sight of the optimizer. We do intermodule register allocation and pipeline instruction scheduling at link time, using information gathered by the compiler back end. The mechanism for analyzing and modifying the program at link time was also useful in a wide array of instrumentation tools. i 1. Introduction When our lab built its experimental RISC workstation, the Titan, we defined a high-level assembly language as the official interface to the machine. This high-level assembly language, called Mahler,...
Procedure Merging with Instruction Caches
- Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation
, 1991
"... This paper describes a method of determining which procedures to merge for machines with instruction caches. The method uses profile information, the structure of the program, the cache size, and the cache miss penalty to guide the choice. Optimization for the cache is assumed to follow procedure me ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
This paper describes a method of determining which procedures to merge for machines with instruction caches. The method uses profile information, the structure of the program, the cache size, and the cache miss penalty to guide the choice. Optimization for the cache is assumed to follow procedure merging. The method weighs the benefit of removing calls with the increase in the instruction cache miss rate. Better performance is achieved than previous schemes that do not consider the cache. Merging always results in a savings, unlike simpler schemes that can make programs slower once cache effects are considered. The new method also has better performance even when parameters to simpler algorithms are varied to get the best performance. This report is a preprint of a paper that will be presented at the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, June 26-28, 1991. Copyright 1990 ACM. i 1 Introduction This paper presents a ...
Fluoroelastomer Pressure Pad Design for Microelectronic Applications
, 1993
"... The elastic properties of gum rubber and fluoroelastomers were studied by a variety of numerical and experimental methods. Results were applied to the design of flat pressure pads for microelectronic applications. The goal was to develop an understanding sufficient that designers could quickly devel ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
The elastic properties of gum rubber and fluoroelastomers were studied by a variety of numerical and experimental methods. Results were applied to the design of flat pressure pads for microelectronic applications. The goal was to develop an understanding sufficient that designers could quickly develop acceptable fluoroelastomer pressure pads without further detailed studies. The effort centered on optimizing the performance of a 14 mm square by 0.8 mm thick pad under a fixed normal force. The primary optimization criterion was minimization of the maximum normal contact stresses applied by the pad to a rigid surface. Judicious perforation of flat pads greatly reduced adverse contact stress gradients. The preferred design used four 1.2 mm holes symmetrically arrayed in a 4 mm square grid centered on the pad. Compared to an unperforated pad, this arrangement yielded a 28% reduction in maximum contact stresses. i ii Fluoroelastomer Pressure Pad Design for Microelectronic Applications ...
Unreachable Procedures in Object-oriented Programming
- ACM Letters on Programming Languages and Systems
, 1993
"... Unreachable procedures are procedures that can never be invoked. Their existence may adversely affect the performance of a program. Unfortunately, their detection requires the entire program to be present. Using a link-time code modification system, we analyze large linked program modules of C++, C ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Unreachable procedures are procedures that can never be invoked. Their existence may adversely affect the performance of a program. Unfortunately, their detection requires the entire program to be present. Using a link-time code modification system, we analyze large linked program modules of C++, C and Fortran. We find that C++ programs using objectoriented programming style contain a large fraction of unreachable procedure code. In contrast, C and Fortran programs have a low and essentially constant fraction of unreachable code. In this paper, we present our analysis of C++, C and Fortran programs, and discuss how object-oriented programming style generates unreachable procedures. This paper will appear in the ACM LOPLAS Vol 1, #4.. It replaces Technical Note TN-21, an earlier version of the same material. i 1 Introduction Unreachable procedures unnecessarily bloat an executable, making it require more disk space and decreasing its locality, which may affect its cache and paging be...
Link-Time Code Modification
- DEC Western Research Lab
, 1989
"... Many existing or potential programming tools require the program to be completely recompiled with a special compiler option. This is usually inconvenient for the program developer, and may reduce the usefulness of the tool or the frequency with which the tool is employed. It may also require the mai ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
Many existing or potential programming tools require the program to be completely recompiled with a special compiler option. This is usually inconvenient for the program developer, and may reduce the usefulness of the tool or the frequency with which the tool is employed. It may also require the maintenance of different versions of standard libraries, each compiled with the appropriate options for a different tool. The difference between modules compiled with and without the special option is often simple and regular. If so, we can effect this difference by modifying the normally-compiled object code at link time, instead of recompiling. This reduces the overhead of using the tool by an order of magnitude, making it much more convenient. i 1. Introduction Recompiling an entire multi-module program from scratch is usually so expensive o that one does it only reluctantly. In spite of this, many useful tools for program ptimization or performance analysis require the recompilation of...
Boolean Matching For Full-Custom ECL Gates
, 1994
"... We present a technology mapper for full-custom ECL gates. These gates are characterized by high fanins and a regular structure. Full-custom gates differ from ECL library gates in that a full range of structures is available as a single form, rather than a large number of individual gates that sparse ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We present a technology mapper for full-custom ECL gates. These gates are characterized by high fanins and a regular structure. Full-custom gates differ from ECL library gates in that a full range of structures is available as a single form, rather than a large number of individual gates that sparsely cover the possible design space. This paper presents a complete boolean matching algorithm and gives a proof of its correctness. We show that it can efficiently map logic into the general ECL gate form. We also show two variants of the algorithm, and show that they give poorer results with no savings in runtime. The mapper described in the paper is a necessary component of a CAD system for designing ECL microprocessors. Manual design of full-custom ECL gates would not be acceptable for control logic since it is a tedious, error prone, and lengthy activity. Nor would a gate-array style mapper and library with a limited number of gates be acceptable, because this makes less effective use of...
A One-Dimensional Thermal Model for the VAX Multi Chip Units
, 1990
"... A thermal resistance network is used to predict the performance of the Multi Chip Units (MCUs) in the VAX 9000 computer. This branched network is comprised of resistors defined by analytical, numerical and experimental techniques. Effects of thermal conduction, contact resistance and convection are ..."
Abstract
- Add to MetaCart
A thermal resistance network is used to predict the performance of the Multi Chip Units (MCUs) in the VAX 9000 computer. This branched network is comprised of resistors defined by analytical, numerical and experimental techniques. Effects of thermal conduction, contact resistance and convection are included. A comparison is made between the model's temperature predictions and test data. Agreement within 15% is achieved, demonstrating that the chips in the MCU will operate well below their specification limit of 85 C. This is a preprint of a paper that will be presented at the American Society of Mechanical Engineers, Winter Annual Meeting, Dallas, Texas, 1990. Copyright 1990 ASME i A ONE-DIMENSIONAL THERMAL MODEL FOR THE VAX 9000 MULTI CHIP UNITS Table of Contents Nomenclature 1 1. Introduction 2 2. Hardware Description 3 2.1. Silicon Chips 4 2.2. Epoxy 4 2.3. Baseplate 4 2.4. Heat Sink 4 3. Thermal Model 5 3.1. Silicon Thin Film Heater Chips 6 3.2. Epoxy 6 3.3. Baseplate 6 3....

