Results 1 -
6 of
6
A Characterization of Processor Performance in the VAX-11/780
- Proceedings of the 11th Annual Symposium on Computer Architecture
, 1984
"... This paper reports the results of a study of VAX-11/780 processor performance using a novel hardware monitoring technique. A micro-PC histogram monitor was built for these measurements. It keeps a count of the number of microcode cycles executed at each microcode location. Measurement experiments we ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
This paper reports the results of a study of VAX-11/780 processor performance using a novel hardware monitoring technique. A micro-PC histogram monitor was built for these measurements. It keeps a count of the number of microcode cycles executed at each microcode location. Measurement experiments were performed on live timesharing workloads as well as on synthetic workloads of several types. The histogram counts allow the calculation of the frequency of various architectural events, such as the frequency of different types of opcodes and operand specifiers, as well as the frequency of some implementation-specific events, such as translation buffer misses. The measurement technique also yields the amount of processing time spent in various activities, such as ordinary microcode computation, memory management, and processor stalls of different kinds. This, paper reports in detail the amount of time the 'average'fVAX instruction spends in these activities. 1.
Automatic Design of Computer Instruction Sets
, 1993
"... This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small c ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small code segments, optimally recompiling these segments using exhaustive search, and finding the cover of the new instructions generated that optimizes the performance metric. The complete process is illustrated by generating an instruction set for a processor optimized for executing compiled Prolog programs. The generated instruction set is compared with the hand-designed VLSI-BAM instruction set. The automatically designed instruction set is smaller and has only a few percent less performance on th...
Synchronous Transfer Architecture (STA)
- in Lecture Notes on Computer Science
, 2004
"... This paper presents a novel micro-architecture for high-performance and low-power DSPs. The underlying Synchronous Transfer Architecture (STA) fills the gap between SIMD-DSPs and coarse-grain reconfigurable hardware. STA processors are modeled using a common machine description suitable for both ..."
Abstract
-
Cited by 19 (15 self)
- Add to MetaCart
This paper presents a novel micro-architecture for high-performance and low-power DSPs. The underlying Synchronous Transfer Architecture (STA) fills the gap between SIMD-DSPs and coarse-grain reconfigurable hardware. STA processors are modeled using a common machine description suitable for both compiler and core generator. The core generator is able to generate models in Lisa, System-C, and VHDL. A special emphasis is placed on the good synthesis of the generated VHDL model.
Relating Static and Dynamic Machine Code Measurements
- IEEE Transactions on Computers
, 1992
"... In an effort to relate static measurements of machine code instructions and addressing modes to their dynamic counterparts, both types of measurements were made on nine different machines using a large and varied suite of programs. Using classical regression analysis techniques, the relationship bet ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In an effort to relate static measurements of machine code instructions and addressing modes to their dynamic counterparts, both types of measurements were made on nine different machines using a large and varied suite of programs. Using classical regression analysis techniques, the relationship between static architecture measurements and dynamic architecture measurements was explored. The statistical analysis showed that many static and dynamic measurements are strongly correlated and that it is possible to use the more easily obtained static measurements to predict dynamic usage of instructions and addressing modes. With few exceptions, the predictions are accurate for most architectural features. Index Terms---Instruction sets, performance evaluation, computer architecture, dynamic measurements, static measurements, machine design. I. Introduction Static measurements of program code at machine level are generally thought to be useful for determining textual space needs while dynam...
PERL - A Registerless Architecture
"... Reducing processor-memory speed gap is one of the major challenges computer architects face today. Efficient use of CPU registers reduces the number of memory accesses. However, registers do incur extra overhead of Load/Store, register allocation and saving of register context across procedure calls ..."
Abstract
- Add to MetaCart
Reducing processor-memory speed gap is one of the major challenges computer architects face today. Efficient use of CPU registers reduces the number of memory accesses. However, registers do incur extra overhead of Load/Store, register allocation and saving of register context across procedure calls. Caches however do not have any such overheads and cache technology has matured to the extent that today the access time of on-chip cache is almost equal to that of registers. This motivates one to explore alternate ways to do away with the overheads of registers. In this paper, we propose a registerless, memory to memory architecture of a processor. We call this architecture Performance Enhanced Registerless (PERL) processor. All instructions in this processor operate directly on memory operands thus eliminating the Load/Store and other overheads of registers. The performance of this machine is studied by simulations and results are reported in this paper. 1. Introduction A major challe...
1 Chapter 7 Performance Enhancement
"... In this chapter, we will concentrate on performance. Performance improvement starts with an analysis of execution profile to understand where the data path spends most of the time. The effort is then directed to redesign the data path, usually by increasing the concurrent operations in the data path ..."
Abstract
- Add to MetaCart
In this chapter, we will concentrate on performance. Performance improvement starts with an analysis of execution profile to understand where the data path spends most of the time. The effort is then directed to redesign the data path, usually by increasing the concurrent operations in the data path. There are interactions amongst choice of design. Choosing one will affect another. The gain in terms of performance must be weighted against the increased complexity in terms of circuit size or the cost. For the purpose of our study we will not change the instruction set. The performance improvement will come from the change of “micro-architecture ” only.

