Results 1 - 10
of
14
Reducing State Loss for Effective Trace Sampling of Superscalar Processors
- In Proceedings of the 1996 International Conference on Computer Design (ICCD
, 1996
"... There is a wealth of technological alternatives that can be incorporated into a processor design. These include reservation station designs, functional unit duplication, and processor branch handlingstrategies. The performance of a given design is measured through the execution of application progra ..."
Abstract
-
Cited by 88 (2 self)
- Add to MetaCart
There is a wealth of technological alternatives that can be incorporated into a processor design. These include reservation station designs, functional unit duplication, and processor branch handlingstrategies. The performance of a given design is measured through the execution of application programs and other workloads. Presently, trace-driven simulation is the most popular method of processor performance analysis in the development stage of system design. Current techniques of trace-driven simulation, however, are extremely slow and expensive. In this paper, a fast and accurate method for statistical trace sampling of superscalar processors is proposed. 1
Dynamic Rescheduling: A Technique for Object Code Compatibility in VLIW Architectures
, 1995
"... Lack of object code compatibility in VLIW architectures is a severe limit to their adoption as a generalpurpose computing paradigm. Previous approaches include hardware and software techniques, both of which have drawbacks. Hardware techniques add to the complexity of the architecture, whereas softw ..."
Abstract
-
Cited by 43 (12 self)
- Add to MetaCart
Lack of object code compatibility in VLIW architectures is a severe limit to their adoption as a generalpurpose computing paradigm. Previous approaches include hardware and software techniques, both of which have drawbacks. Hardware techniques add to the complexity of the architecture, whereas software techniques require multiple executables. This paper presents a technique called Dynamic Rescheduling that applies software techniques dynamically, using intervention by the operating system. Results are presented to demonstrate the viability of the technique using the Illinois IMPACT compiler and the TINKER architectural framework. 1 Introduction Lack of object-code compatibility across generations of a VLIW architecture is an often raised objection to its use as a general-purpose computing paradigm [1]. A program binary compiled for VLIW generation x cannot be guaranteed to execute correctly on generations x + n or x \Gamma n, for a reasonable value of n. This means that an installe...
A Parallel Genetic Algorithm for Multiobjective Microprocessor Design
- Proceedings of the Sixth International Conference on Genetic Algorithms
, 1995
"... The microprocessor chip designer must solve the problem of partitioning millions of transistors into an arbitrary number of hardware structures within a finite chip area toward achieving maximumperformance. This combinative complexity is compounded by a lengthy performance evaluation of each propose ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
The microprocessor chip designer must solve the problem of partitioning millions of transistors into an arbitrary number of hardware structures within a finite chip area toward achieving maximumperformance. This combinative complexity is compounded by a lengthy performance evaluation of each proposed design. We present the application of a real-valued multiobjective genetic algorithm on an asynchronous parallel workstation network as a optimization approach well suited to this problem. By casting design budget constraints as multiple design objectives, the need for penalty functions is eliminated. A microprocessor cache memory design problem is optimized with the genetic algorithm. 1 Microprocessor Design Problem Microprocessor chip designers now have more transistors and design alternatives available to them than at any time in the past. The chip designer's selection of hardware structures from many alternatives (e.g., adders, multipliers, memories) must maximize microprocessor perfo...
System-Level Power Consumption Modeling and Tradeoff Analysis Techniques for Superscalar Processor Design
, 1997
"... High-level decisions in high-performance processors are often decoupled from their ultimate impact on power usage. For example, superscalar hardware and high degrees of pipelining are excellent sources for high parallelism. They often result in higher power usage. This problem is further complicated ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
High-level decisions in high-performance processors are often decoupled from their ultimate impact on power usage. For example, superscalar hardware and high degrees of pipelining are excellent sources for high parallelism. They often result in higher power usage. This problem is further complicated by the usage patterns of each unit in the processor. The usage patterns are determined by the programs the system executes, and ultimately by the applications the processor is targeted towards. This paper presents systematic techniques to find low-power, high-performance superscalar processors tailored to specific user applications. The model of power is novel because it separates power into architectural and technology components. The architectural component is found via trace-driven simulation, which also produces performance estimates. An example technology model is presented that estimates the technology component, along with critical delay time and real estate usage. This model is base...
Applying Programming Language Implementation Techniques to PROCESSOR SIMULATION
, 2000
"... This memoization makes the simulator run 5--12 times faster, with no change in simulation results (e.g., cycle count). Combining direct-execution and memoization, FastSim simulates a MIPS R10000-like microarchitecture with a 190--360 times slowdown (i.e., simulation time over native benchmark execut ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This memoization makes the simulator run 5--12 times faster, with no change in simulation results (e.g., cycle count). Combining direct-execution and memoization, FastSim simulates a MIPS R10000-like microarchitecture with a 190--360 times slowdown (i.e., simulation time over native benchmark execution time on the host), which is an order of magnitude faster than SimpleScalar.
Tradeoffs in Processor/Memory Interfaces for Superscalar Processors
- In Proceedings 25th Annual International Symposium on Microarchitecture
, 1992
"... This paper addresses the relative merits of the three processor/memory interface schemes by constructing a fair comparison between the schemes. Six members of the SPEC89 benchmark set are used [4]. For each benchmark (where possible) a cache size is selected that achieves a fixed miss ratio. Miss ra ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper addresses the relative merits of the three processor/memory interface schemes by constructing a fair comparison between the schemes. Six members of the SPEC89 benchmark set are used [4]. For each benchmark (where possible) a cache size is selected that achieves a fixed miss ratio. Miss ratios of 5% and 10% are used for the study. This effectively removes the fixed cache size problem and replaces it with fixed cache performance.
A technique to determine power-efficient, high-performance superscalar processors
- In Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences
, 1995
"... Processor performance advances are increasingly in-hibit(ed by limitations in thermal power dissipation. Part of the problem is the lack of architectural power estimates before implementation. Although high-performance designs exist that dissipate low power, the method for finding these designs has ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Processor performance advances are increasingly in-hibit(ed by limitations in thermal power dissipation. Part of the problem is the lack of architectural power estimates before implementation. Although high-performance designs exist that dissipate low power, the method for finding these designs has bc:en through trial-and-error. This paper presents system-atic techniques to find low-power, high-performance superscalar processors tailored to specific user bench-marks. The model of power is novel because it sep-arates power into architectural and technology com-ponents. The architectural component is found via trace-driven simulation, which also produces perfor-mance estimates. An example technology model is presented that estimates the technology component, along with critical delay time and real estate usage. This model is bwed on case studies of actual designs. It is used to solve an important problem: increasing the duplication in superscalar execution units without excessive power consumption. Results are present#ed from runs using simulated annealing to maximize pro-cessor performance subject to power and area con-st#raints. The major contributions of this paper are the sep-aration of architectural and technology components of dynamic power, the use of trace-driven simulation for architectural power measurement, and the use of a near-optimal search t,o tailor a processor design to a benchmark. 1
Systematic Objective-driven Computer Architecture Optimization
- in Proc. 16th Conference on Advanced Research in VLSI (ARVLSI'95
, 1995
"... Computer designers now have more transistors and architectural alternatives than at any time. Computer-aided design tools automate much of the physical design process. However, few tools have been developed to help the computer architect specify near-optimal microarchitectural configurations in the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Computer designers now have more transistors and architectural alternatives than at any time. Computer-aided design tools automate much of the physical design process. However, few tools have been developed to help the computer architect specify near-optimal microarchitectural configurations in the early design stages. Such tools are needed to systematically guide the early design specifications subject to multiple objectives such as cost, performance, and power consumption. This paper illustrates an objective-driven microarchitectural design methodology that couples the specification design phase with an optimization technique. The design of a memory hierarchy with multiple performance objectives is used as a case study. This is a directed search problem with a high dimensionality. We show that the genetic algorithm, a global optimization technique based on the metaphor of natural selection and survival of the fittest, is an ideal candidate for such an objective-driven search in a hig...
Optimization of VLIW Compatibility Systems Employing Dynamic Rescheduling
- INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
"... Lack of object code compatibility in VLIW architectures is a severe limit to their adoption as a generalpurpose computing paradigm. Previous approaches include hardware and software techniques, both of which have drawbacks. Hardware techniques add to the complexity of the architecture, whereas sof ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Lack of object code compatibility in VLIW architectures is a severe limit to their adoption as a generalpurpose computing paradigm. Previous approaches include hardware and software techniques, both of which have drawbacks. Hardware techniques add to the complexity of the architecture, whereas software techniques require multiple executables. This paper presents a technique called Dynamic Rescheduling that applies software techniques dynamically, using intervention by the OS: at each first-time page fault, the page of code is rescheduled for the new generation, if required. Results are presented to demonstrate the viability of the technique using the Illinois IMPACT compiler and the TINKER architectural framework. For the machine models and the workloads used in this study, performance of the rescheduled code compares well with the native scheduled code for a machine. The behavior of a subset of programs in the workload is such that they face a large number of first-time page ...
Evaluation Of Some Superscalar And VLIW Processor Designs
, 1992
"... CONTENTS Page 1. INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2. OVERVIEW OF IMPACT : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 3. CODE SCHEDULING : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 4. HARDWARE MODELS : : : : : : : : : : : : : : ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
CONTENTS Page 1. INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 2. OVERVIEW OF IMPACT : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 3. CODE SCHEDULING : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 4. HARDWARE MODELS : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 4.1 The VLIW Hardware Model : : : : : : : : : : : : : : : : : : : : : : : 12 4.2 Superscalar In-order Hardware Model : : : : : : : : : : : : : : : : : : 18 4.3 Superscalar Out-of-order Hardware Model : : : : : : : : : : : : : : : 19 4.4 Register Renaming : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23 4.5 Branch Prediction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24 5. SIMULATOR : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25 5.1 Simulation of a VLIW Architecture<

