Results 1 -
6 of
6
The Uniform Memory Hierarchy Model of Computation
- Algorithmica
, 1992
"... The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performance-relevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use im ..."
Abstract
-
Cited by 108 (9 self)
- Add to MetaCart
The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performance-relevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use improved data-movement strategies. A sequential computer's memory is modelled as a sequence hM 0 ; M 1 ; :::i of increasingly large memory modules. Computation takes place in M 0 . Thus, M 0 might model a computer's central processor, while M 1 might be cache memory, M 2 main memory, and so on. For each module M U , a bus B U connects it with the next larger module M U+1 . All buses may be active simultaneously. Data is transferred along a bus in fixed-sized blocks. The size of these blocks, the time required to transfer a block, and the number of blocks that fit in a module are larger for modules farther from the processor. The UMH model is parameterized by the rate at which the blocksizes i...
Multithreaded Architectures: Principles, Projects and Issues
, 1994
"... The architecture of future high performance computer systems will respond to the possibilities offered by technology and to the increasing demand for attention to issues of programmability. Multithreaded processing element architectures are a promising alternative to RISC architecture and its multip ..."
Abstract
-
Cited by 23 (12 self)
- Add to MetaCart
The architecture of future high performance computer systems will respond to the possibilities offered by technology and to the increasing demand for attention to issues of programmability. Multithreaded processing element architectures are a promising alternative to RISC architecture and its multiple-instruction-issue extensions such as VLIW, superscalar, and superpipelined architectures. This paper presents an overview of multithreaded computer architectures and the technical issues affecting their prospective evolution. We introduce the basic concepts of multithreaded computer architecture and describe several architectures representative of the design space for multithreaded, parallel computers. We review design issues for multithreaded processing elements intended for use as the node processor of parallel computers for scientific computing. These include the question of choosing an appropriate program execution model, the organization of the processing element to achieve good utilization of major resources, support for fine-grain interprocessor communication and global memory access, compiling machine code for multithreaded processors, and the challenge of implementing virtual memory in large-scale multiprocessor systems.
Early Design Cycle Timing Simulation of Caches
- IN UNIVERSITY OF MICHIGAN { ANN ARBOR
, 1996
"... ..."
Flexible Timing Simulation of Multiple-Cache Configurations
, 1997
"... As the gap between processor and memory speeds increases, cache performance becomes more critical to overall system performance. Behavioral cache simulation is typically used early in the design cycle of new processor/cache configurations to determine the performance of proposed cache configurations ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
As the gap between processor and memory speeds increases, cache performance becomes more critical to overall system performance. Behavioral cache simulation is typically used early in the design cycle of new processor/cache configurations to determine the performance of proposed cache configurations on target workloads. However, behavioral cache simulation does not account for the latency seen by each memory access. The Latency-Effects (LE) cache model presented in this paper accounts this nominal latency as well as the additional latencies due to trailing-edge effects, bus width considerations, port conflicts, and the number of outstanding accesses that a cache allows before it blocks. We also extend the LE cache model to handle the latency effects of moving data among multiple caches. mlcache, a new, easily configurable and extensible tool, has been built based on the extended LE model. We show the use of mlcache in estimating the performance of traditional and novel cache configurat...
Memory System Design For Bus Based Multiprocessors
, 1991
"... This dissertation studies the design of single bus, shared memory multiprocessors. The purpose of the studies is to find optimum points in the design space for different memory system components that include private caches, shared bus and main memory. Two different methodologies are used based on th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This dissertation studies the design of single bus, shared memory multiprocessors. The purpose of the studies is to find optimum points in the design space for different memory system components that include private caches, shared bus and main memory. Two different methodologies are used based on the operating environment of a multiprocessor. For a multiprocessor operating in the throughput-oriented environment, Customized Mean Value Analysis (CMVA) models are developed to evaluate the performance of the multiprocessor. The accuracy of the models are validated by comparing their results to those generated by actual trace-driven simulation over several thousand multiprocessor configurations. The comparison results show that the CMVA models can be as accurate as trace driven simulation in predicting the multiprocessor throughput and bus utilization. The validated models are then used to evaluate design choices that include cache size, cache block size, cache set-associativity, bus switch...
Effective Utilization of the Reorder Buffer for Short-Lived Variables
, 1994
"... In this paper, we have observed an interesting phenomenon in superscalar architectures with aggressive hardware mechanisms (such as register renaming and reorder buffers): a significant number of program variables are "short-lived" in the sense that their whole live ranges occur entirely within the ..."
Abstract
- Add to MetaCart
In this paper, we have observed an interesting phenomenon in superscalar architectures with aggressive hardware mechanisms (such as register renaming and reorder buffers): a significant number of program variables are "short-lived" in the sense that their whole live ranges occur entirely within the reorder buffer. Therefore, upon completion, the values of these short-lived variables do not need to be written (committed) back to the register files. Based on this observation, we have proposed a compile-time analysis method (called short-live-range analysis) and a simple architecture feature to avoid the useless commit of these short-lived variables. Furthermore, we propose a compiler mechanism to directly assign these variables to the reorder buffer (instead of the register file), thus decreasing the total requirements on the register file. We have implemented our scheme and our simulation results show: (1) the shortlive -range analysis and the proposed architecture feature can be succes...

