Results 1 - 10
of
16
Out-of-Order Vector Architectures
, 1997
"... Register renaming and out-of-order instruction issue are now commonly used in superscalar processors. These techniques can also be used to significant advantage in vector processors, as this paper shows. Performance is improved and available memory bandwidth is used more effectively. Using a trace d ..."
Abstract
-
Cited by 46 (21 self)
- Add to MetaCart
Register renaming and out-of-order instruction issue are now commonly used in superscalar processors. These techniques can also be used to significant advantage in vector processors, as this paper shows. Performance is improved and available memory bandwidth is used more effectively. Using a trace driven simulation we compare a conventional vector implementation, based on the Convex C3400, with an out-of-order, register renaming, vector implementation. When the number of physical registers is above 12, out-of-order execution coupled with register renaming provides a speedup of 1.24--1.72 for realistic memory latencies. Out-of-order techniques also tolerate main memory latencies of 100 cycles with a performance degradation less than 6%. The mechanisms used for register renaming and out-of-order issue can be used to support precise interrupts -- generally a difficult problem in vector machines. When precise interrupts are implemented, there is typically less than a 10% degradation in performance. A new technique based on register renaming is targeted at dynamically eliminating spill code; this technique is shown to provide an extra speedup ranging between 1.10 and 1.20 while reducing total memory traffic by an average of 15--20%.
Automatic Design of Computer Instruction Sets
, 1993
"... This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small c ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This dissertation presents the thesis that good and usable instruction sets can be automatically derived for a specified data path and benchmark set. This is achieved by a multistep process: generating execution traces for the benchmark programs, sampling these traces to form a large set of small code segments, optimally recompiling these segments using exhaustive search, and finding the cover of the new instructions generated that optimizes the performance metric. The complete process is illustrated by generating an instruction set for a processor optimized for executing compiled Prolog programs. The generated instruction set is compared with the hand-designed VLSI-BAM instruction set. The automatically designed instruction set is smaller and has only a few percent less performance on th...
The Expected Lifetime of "Single-Address-Space" Operating Systems
, 1994
"... Trends toward shared-memory programming paradigms, large (64-bit) address spaces, and memory-mapped les have led some to propose the use of a single virtual-address space, shared by all processes and processors. Typical proposals require the single address space to contain all process-private data, ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Trends toward shared-memory programming paradigms, large (64-bit) address spaces, and memory-mapped les have led some to propose the use of a single virtual-address space, shared by all processes and processors. Typical proposals require the single address space to contain all process-private data, shared data, and stored les. To simplify management of an address space where stale pointers make it di cult to re-use addresses, some have claimed that a 64-bit address space is su ciently large that there is no need to ever re-use addresses. Unfortunately, there has been no data to either support or refute these claims, or to aid in the design of appropriate address-space management policies. In this paper, we present the results of extensive kernel-level tracing of the workstations in our department, and discuss the implications for single-address-space operating systems. We found that single-address-space systems will not outgrow theavailable address space, but only if reasonable space-allocation policies are used, and only if the system can adapt as larger address spaces become available.
Looking Backward and Forward at the Internet
- The Information Society
, 1998
"... *This version of the article is a late working copy. Please do not quote exactly without ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
*This version of the article is a late working copy. Please do not quote exactly without
Smart Register Files for High-Performance Microprocessors
, 1999
"... This report examines how the compiler can more efficiently use a large number of processor registers. The placement of data items into registers, called register allocation, is known to be one of the most important compiler optimizations for high-speed computers because registers are the fastest st ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This report examines how the compiler can more efficiently use a large number of processor registers. The placement of data items into registers, called register allocation, is known to be one of the most important compiler optimizations for high-speed computers because registers are the fastest storage devices in the computer system. However, register allocation has been limited in scope because of aliasing in the memory system. To break this limitation and allow more data to be placed into registers, new compiler and microarchitecture support is needed. We propose the modification of register access semantics to include an indirect access mode. We call this optimization the Smart Register File. The smart register file allows the relaxation of overly-conservative assumptions in the compiler by having the hardware provide support for aliased data items in processor registers. As a result, the compiler can allocate data from a larger pool of candidates than in a conventional system. An...
Automatic Specification of Reliability Models for Fault-Tolerant Computers
, 1993
"... Semi-Markov Specification Interface to the SURE Tool) program, which uses an abstract language for specifying Markov reliability models, is described in Butler (1986). The language has statements to specify the state space, by defining the state variables and their range; the start state, by the ini ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Semi-Markov Specification Interface to the SURE Tool) program, which uses an abstract language for specifying Markov reliability models, is described in Butler (1986). The language has statements to specify the state space, by defining the state variables and their range; the start state, by the initial values of the state variables; the death states, by a Boolean expression of the state variables; and the state transitions, by a set of if-then rules that define, in terms of the state variables, the possible transitions, their rates, and their destination states. This language has been implemented in the ASSIST program to generate Markov reliability models in the SURE input language (Johnson 1986). The implementation provides three optional state space reduction techniques. The first technique is pruning the model during its generation by conservatively assuming system failure once a state satisfies a prune condition specified as a Boolean expression of the state variables (Johnson 198...
Compiler and Microarchitecture Mechanisms for Exploiting Registers to Improve Memory Performance
, 2001
"... name for a data object. Def Set of data that is defined by a statement. Use Set of data that is used by a reference. DefUseChain Marker that indicates whether reaching-definition analysis was run. DefUseSummary Mod-ref information for a function. It contains the non-local data items which are ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
name for a data object. Def Set of data that is defined by a statement. Use Set of data that is used by a reference. DefUseChain Marker that indicates whether reaching-definition analysis was run. DefUseSummary Mod-ref information for a function. It contains the non-local data items which are potentially modified or referenced by the function. ReachingDef For each statement, collection of definitions that reach the statement. Replacement For each node on the parse tree, provides a back pointer to the parent node and implements node self-replacement. AvailableExpression For each statement, the collection of expressions that reach the statement. ValueProfile For each function, all the parameters and the values they take. Labelflow Store goto and label information. LiveOut, LiveIn, LiveVariable Used during inlining to estimate when inlining should not be performed because of high register pressure. Type Name Description Table 7.3: MIRV attributes. 179 7.5.14. Po...
A Review of HDLs
, 1990
"... The complex nature of computer design and the need for more sophisticated tools to help the computer designer are stated. The benefits that one can get from hardware description languages (HDLs) and the basic requirements for such languages are presented. A classification scheme for HDLs based on th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The complex nature of computer design and the need for more sophisticated tools to help the computer designer are stated. The benefits that one can get from hardware description languages (HDLs) and the basic requirements for such languages are presented. A classification scheme for HDLs based on the process of computer design gives a good insight into this area. The register transfer level of abstraction is detailed, for this level is more closely related to the LIDEX project. A few representative HDLs (CDL, AHPL, ISPS and VHDL) are presented and discussed. This work was supported in part by the Interamerican Development Bank/University of S~ao Paulo project, and the U. S. Department of Energy under Grant No. DE--FG02--85ER25001. Jos'e E. Moreira is with Laborat'orio de Sistemas Integr'aveis, and Wilson V. Ruggiero is with Laborat'orio de Sistemas Digitais Contents 1 Introduction 4 2 The Need for HDLs 4 3 Benefits of HDLs 7 4 Requirements for HDLs 9 5 Classification of HDLs 12 ...
CA Computer Architecture
"... ion means programmers can describe algorithms in a "high level" notation that is independent of details about the machine that will execute the algorithm. Portability is a byproduct of abstraction that allows programs to be run on a wide variety of computers as long as there is a compiler that will ..."
Abstract
- Add to MetaCart
ion means programmers can describe algorithms in a "high level" notation that is independent of details about the machine that will execute the algorithm. Portability is a byproduct of abstraction that allows programs to be run on a wide variety of computers as long as there is a compiler that will translate them for each machine. In most programming situations reality is close to the ideal. Compilers for many high level languages are very good at generating efficient and portable code for typical computer systems, so programmers are able to express algorithms in high level languages and expect them to run efficiently on almost any machine. There may be a few isolated places where a programmer who invests a lot of effort may be able to write a more efficient routine in assembly language (the native language of the machine), but it is hardly ever worth the effort to write an entire program in assembly language. Obviously when all or part of a program is written in assembler it is not as...

