Results 1 - 10
of
11
DAISY: Dynamic Compilation for 100% Architectural Compatibility
, 1997
"... Although VLIW architectures offer the advantages of simplicity of design and high issue rates, a major impediment to their use is that they are not compatible with the existing software base. We describe new simple hardware features for a VLIW machine we call DAISY (Dynamically Architected Instructi ..."
Abstract
-
Cited by 173 (12 self)
- Add to MetaCart
Although VLIW architectures offer the advantages of simplicity of design and high issue rates, a major impediment to their use is that they are not compatible with the existing software base. We describe new simple hardware features for a VLIW machine we call DAISY (Dynamically Architected Instruction Set from Yorlaown). DAISY is specifically intended to emulate existing architectures, so that all existing software for an old architecture (including operating system kernel code) runs without changes on the VLIW. Each time a new fragment of code is executed for the first time, the code is translated to VLIW primitives, parallelized and saved in a portion of main memory not visible to the old architecture, by a Firtual Machine Monitor (software) residing in read only memory. Subsequent executions of the same fragment do not require a translation (unless cast out). We discuss the architectural requirements for such a VLIW, to deal with issues including self-modifying code, precise exceptions, and aggressive reordedng of memory references in the presence of strong MP consistency and memory mapped I/O. We have implemented the dynamic parallelization algorithms for the PowerPC architecture. The initial results show high degrees of instruction level parallelism with reasonable translation overhead and memory usage.
The Multiscalar Architecture
, 1993
"... The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent t ..."
Abstract
-
Cited by 113 (8 self)
- Add to MetaCart
The centerpiece of this thesis is a new processing paradigm for exploiting instruction level parallelism. This paradigm, called the multiscalar paradigm, splits the program into many smaller tasks, and exploits fine-grain parallelism by executing multiple, possibly (control and/or data) depen-dent tasks in parallel using multiple processing elements. Splitting the instruction stream at statically determined boundaries allows the compiler to pass substantial information about the tasks to the hardware. The processing paradigm can be viewed as extensions of the superscalar and multiprocess-ing paradigms, and shares a number of properties of the sequential processing model and the dataflow processing model. The multiscalar paradigm is easily realizable, and we describe an implementation of the multis-calar paradigm, called the multiscalar processor. The central idea here is to connect multiple sequen-tial processors, in a decoupled and decentralized manner, to achieve overall multiple issue. The mul-tiscalar processor supports speculative execution, allows arbitrary dynamic code motion (facilitated by an efficient hardware memory disambiguation mechanism), exploits communication localities, and does all of these with hardware that is fairly straightforward to build. Other desirable aspects of the
Dynamic binary translation and optimization
- IEEE Transactions on Computers
, 2001
"... AbstractÐWe describe a VLIW architecture designed specifically as a target for dynamic compilation of an existing instruction set architecture. This design approach offers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architectu ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
AbstractÐWe describe a VLIW architecture designed specifically as a target for dynamic compilation of an existing instruction set architecture. This design approach offers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architecture, and makes use of dynamic adaptation. Thus, the original architecture is implemented using dynamic compilation, a process we refer to as DAISY (Dynamically Architected Instruction Set from Yorktown). The dynamic compiler exploits runtime profile information to optimize translations so as to extract instruction level parallelism. This work reports different design trade-offs in the DAISY system and their impact on final system performance. The results show high degrees of instruction parallelism with reasonable translation overhead and memory usage. Index TermsÐDynamic compilation, binary translation, dynamic optimization, just-in-time compilation, adaptive code generation, profile-directed feedback, instruction-level parallelism, very long instruction word architectures, virtual machines, instruction set architectures, instruction set layering. æ 1
Execution-based scheduling for VLIW architectures
- In Euro-Par '99 Parallel Processing { 5th International Euro-Par Conference, number 1685 in Lecture
"... Abstract. We describe a new dynamic software scheduling technique for VLIW architectures, which compiles into VLIW code the program paths that are actually executed. Unlike trace processors, or DIF, the technique executes operations speculatively on multiple paths through the code, is resilient to b ..."
Abstract
-
Cited by 20 (15 self)
- Add to MetaCart
Abstract. We describe a new dynamic software scheduling technique for VLIW architectures, which compiles into VLIW code the program paths that are actually executed. Unlike trace processors, or DIF, the technique executes operations speculatively on multiple paths through the code, is resilient to branch mispredictions, and can achieve very large dynamic window sizes necessary for high ILP. Aggressive optimizations are applied to frequently executed portions of the code. Encouraging performance results were obtained on SPECint95 and TPC-C. Thetechnique can be used for binary translation for achieving architectural compatibility with an existing processor, or as a VLIW scheduling technique in its own right. Keywords: Instruction-level parallelism, Dynamic compilation, Binary translation, Superscalar
Binary translation and architecture convergence issues for IBM System/390
- In Proc. of the International Conference on Supercomputing 2000, Santa Fe, NM
, 2000
"... We describe the design issues in an implementation of the ESA/390 architecture based on binary translation to a very long instruction word (VLIW) processor. During binary translation, complex ESA/390 instructions are decomposed into instruction “primitives ” which are then scheduled onto a wide-issu ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
We describe the design issues in an implementation of the ESA/390 architecture based on binary translation to a very long instruction word (VLIW) processor. During binary translation, complex ESA/390 instructions are decomposed into instruction “primitives ” which are then scheduled onto a wide-issue machine. The aim is to achieve high instruction level parallelism due to the increased scheduling and optimization opportunities which can be exploited by binary translation software, combined with the efficiency of long instruction word architectures. A further aim is to study the feasibility of a common execution platform for different instruction set architectures, such as ESA/390, RS/6000, AS/400 and the Java Virtual Machine, so that multiple systems can be built around a common execution platform. 1.
Sathaye, \Properties of rescheduling size invariance for dynamic rescheduling-based VLIW cross-generation compatibility
- IEEE Transactions on Computers, Vol 49, Issue
, 1997
"... The object-code compatibility problem in VLIW architectures stems from their statically scheduled nature. Dynamic rescheduling (DR) [1] is a technique to solve the compatibility problem in VLIWs. DR reschedules program code pages at rst-time page faults i.e., when the code pages are accessed for the ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The object-code compatibility problem in VLIW architectures stems from their statically scheduled nature. Dynamic rescheduling (DR) [1] is a technique to solve the compatibility problem in VLIWs. DR reschedules program code pages at rst-time page faults i.e., when the code pages are accessed for the rst time during execution. Treating a page of code as the unit of rescheduling makes it susceptible to hazard of changes in the page-size during the process of rescheduling. This paper proves that the changes in the page-size are only due to insertion and/or deletion of NOPs in the code. Further, it presents an ISA encoding called list encoding, which does not require explicit encoding of the NOPs in the code. A property oftheencoding called rescheduling-size invariance (RSI) is presented and it is proved that the list encoding satis es this property. 1
Timing Insensitive Binary to Binary Translation of Real Time Systems
- Workshop on Architectures for Real-Time Applications, ISCA
"... Binary to binary translation (BBT) provides a solution to the problem of migrating software from older architectures to newer, faster ones through direct translation of a program executable from one instruction-set to another, without the need for recompilation. While BBT technology is well establis ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Binary to binary translation (BBT) provides a solution to the problem of migrating software from older architectures to newer, faster ones through direct translation of a program executable from one instruction-set to another, without the need for recompilation. While BBT technology is well established for traditional programs, the BBT transformation of real-time programs has additional constraints which are only now addressed. When translating real-time programs, not only must the semantics of the code be preserved, but also the timing of external event processing and generation. This paper describes a technique for maintaining program timing behavior during binary-binary translation, and presents experimental results (for translation from the M68000 to the POWER architecture) that demonstrate that the timing of a translated software system can be held to within 1ms of the original system with reasonable overhead. 1. Introduction The newest, fastest processors are based on architect...
* This work was supported by National Science Foundation grants CCR-8919635 and CCR-9410706 and by an IBM Graduate Fellowship.
- IEEE Transactions on Computers
, 1996
"... To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references, especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such s ..."
Abstract
- Add to MetaCart
To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references, especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardware mechanism, called an Address Resolution Buffer (ARB), for performing dynamic reordering of memory references. The ARB supports the following features: (i) dynamic memory disambiguation in a decentralized manner, (ii) multiple memory references per cycle, (iii) out-of-order execution of memory references, (iv) unresolved loads and stores, (v) speculative loads and stores, and (vi) memory renaming. The paper presents the results of a simulation study that we conducted to verify the efficacy of the ARB for a superscalar processor. The paper also shows the ARB's application in a multiscalar proce...
Inherently Lower Complexity Architectures using Dynamic Optimization
, 2002
"... Based on the conviction that modern superscalar out-of-order designs squander useful resources for little incremental gain, the BOA team embarked on a design effort to develop an architecture where computational elements dominated the design. At the same time, we wanted to preserve the ability to ad ..."
Abstract
- Add to MetaCart
Based on the conviction that modern superscalar out-of-order designs squander useful resources for little incremental gain, the BOA team embarked on a design effort to develop an architecture where computational elements dominated the design. At the same time, we wanted to preserve the ability to adapt to changing workload behavior dynamically, but without the overhead inherent in traditional out-of-order designs. We turned to maturing dynamic compilation technology to achieve dynamic adaptability, while keeping core complexity low.
RC22025 (98128) 18 July 2000 Computer Science
- IEEE Transactions on Computers
, 2001
"... We describe a VLIW architecture designed speci#cally as a target for dynamic compilation of an existing instruction set architecture. This design approach o#ers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architecture, and mak ..."
Abstract
- Add to MetaCart
We describe a VLIW architecture designed speci#cally as a target for dynamic compilation of an existing instruction set architecture. This design approach o#ers the simplicity and high performance of statically scheduled architectures, achieves compatibility with an established architecture, and makes use of dynamic adaptation. Thus, the original architecture is implemented using dynamic compilation, a process we refer to as DAISY #Dynamically Architected Instruction Set from Yorktown#. The dynamic compiler exploits runtime pro#le information to optimize translations so as to extract instruction level parallelism. This work reports di#erent design trade-o#s in the DAISY system, and their impact on #nal system performance. The results show high degrees of instruction parallelism with reasonable translation overhead and memory usage.

