Results 1 - 10
of
11
On embedding a microarchitectural design language within Haskell
- In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP ’99
, 1999
"... Based on our experience with modelling and verifying microarchitectural designs within Haskell, this paper examines our use of Haskell as host for an embedded language. In particular, we highlight our use of Haskell's lazy lists, type classes, lazy state monad, and unsafePerformIO, and point to seve ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
Based on our experience with modelling and verifying microarchitectural designs within Haskell, this paper examines our use of Haskell as host for an embedded language. In particular, we highlight our use of Haskell's lazy lists, type classes, lazy state monad, and unsafePerformIO, and point to several areas where Haskell could be improved in the future. We end with an example of a benefit gained by bringing the functional perspective to microarchitectural modelling.
Coming challenges in microarchitecture and architecture
- Proc. IEEE
, 2001
"... In the past several decades, the world of computers and especially that of microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and, in turn, accelerating additional software and hardware development ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
In the past several decades, the world of computers and especially that of microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and, in turn, accelerating additional software and hardware development that fueled this process even more. The technology that enabled this exponential growth is a combination of advancements in process technology, microarchitecture, architecture, and design and development tools. While the pace of this progress has been quite impressive over the last two decades, it has become harder and harder to keep up this pace. New process technology requires more expensive megafabs and new performance levels require larger die, higher power consumption, and enormous design and validation effort. Furthermore, as CMOS technology continues to advance, microprocessor design is exposed to a new set of challenges. In the near future, microarchitecture has to consider and explicitly manage the limits of semiconductor technology, such as wire delays, power dissipation, and soft errors. In this paper, we describe the role of microarchitecture in the computer world, present the challenges ahead of us, and highlight areas where microarchitecture can help address these challenges. Keywords—Design tradeoffs, microarchitecture, microarchitecture trends, microprocessor, performance improvements, power issues, technology scaling. I.
Architecture And Compiler Design Issues In Programmable Media Processors
, 2000
"... The processing demands for multimedia applications are rapidly escalating. Many current applications are pushing the limits of existing microprocessors, and the next generation of multimedia promises considerably greater demands. Adequate support for future multimedia requires the flexibility and co ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
The processing demands for multimedia applications are rapidly escalating. Many current applications are pushing the limits of existing microprocessors, and the next generation of multimedia promises considerably greater demands. Adequate support for future multimedia requires the flexibility and computing power of high-level language (HLL) programmable media processors. This thesis examines the architecture and compiler design issues for programmable media processors. Design of the architecture requires an accurate understanding of multimedia characteristics. Using the MediaBench benchmark suite and the Impact compiler, workload and architecture evaluations were performed to define the essential architecture for programmable media processors. The workload evaluation examines various processing aspects, including functional necessities, data types and sizes, branch performance, loop characteristics, memory statistics, and instruction level parallelism. The architecture evaluation exam...
The Dynamic Trace Memoization Reuse Technique
- In 9th PACT, p. 92–99, 2000, IEEE CS
, 2000
"... Dynamic Trace Memoization (DTM) is a reuse technique that employs memoization tables to skip the execution of sequences of redundant instructions. For the SPECInt95 benchmark programs, DTM delivers performance improvements from 5% to 21% with an average of 9.3%. Moreover, DTM attains twice the avera ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Dynamic Trace Memoization (DTM) is a reuse technique that employs memoization tables to skip the execution of sequences of redundant instructions. For the SPECInt95 benchmark programs, DTM delivers performance improvements from 5% to 21% with an average of 9.3%. Moreover, DTM attains twice the average speedup of two other previously proposed reuse mechanisms for a subset of the SPECInt95 benchmarks. 1. Introduction Redundant instructions represent a significant portion of the instructions executed by a program [1]. Redundant instructions are dynamic instances of the same static instructions which execute with the same operand values and therefore produce the same result. Instruction redundancy is originated, for example, in expressions within loops or procedures that repeatedly compute upon the same or quasi-identical data. Compiler transformations like common subexpression elimination and loop-invariant code motion [2] may be not effective in finding redundancies that manifest them...
Using Theorem Proving and Algorithmic Decision Procedures for Large-Scale System Verification
, 2005
"... To the few people who believed I could do it even when I myself didn’t Acknowledgments This dissertation has been shaped by many people, including my teachers, collabo-rators, friends, and family. I would like to take this opportunity to acknowledge the influence they have had in my development as a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
To the few people who believed I could do it even when I myself didn’t Acknowledgments This dissertation has been shaped by many people, including my teachers, collabo-rators, friends, and family. I would like to take this opportunity to acknowledge the influence they have had in my development as a person and as a scientist. First and foremost, I wish to thank my advisor J Strother Moore. J is an amazing advisor, a marvellous collaborator, an insightful researcher, an empathetic teacher, and a truly great human being. He gave me just the right balance of freedom, encouragement, and direction to guide the course of this research. My stimulating discussions with him made the act of research an experience of pure enjoyment, and helped pull me out of many low ebbs. At one point I used to believe that whenever I was stuck with a problem one meeting with J would get me back on track. Furthermore, my times together with J and Jo during Thanksgivings and other occasions always made me feel part of his family. There was no problem, technical or otherwise, that I could not discuss with J, and there was no time when
RENAMING TECHNIQUES TO BOOST PROCESSOR AND SYSTEM PERFORMANCE, VIRTUALLY ALL RECENT SUPERSCALARS RENAME REGISTERS.
"... Register renaming is a technique to remove false data dependencies—write after read (WAR) and write after write (WAW)— that occur in straight line code between register operands of subsequent instructions. 1-3 By eliminating related precedence requirements in the execution sequence of the instructio ..."
Abstract
- Add to MetaCart
Register renaming is a technique to remove false data dependencies—write after read (WAR) and write after write (WAW)— that occur in straight line code between register operands of subsequent instructions. 1-3 By eliminating related precedence requirements in the execution sequence of the instructions, renaming increases the average number of instructions that are available for parallel execution per cycle. This results in increased IPC (number of instructions executed per cycle). The identification and exploration of the design space of register-renaming lead to a comprehensive understanding of this intricate technique.
Aspects Avancés D'architecture Des Ordinateurs
, 1998
"... us les uns dans les autres dans le sens o`u toutes les donn'ees d'un niveau sont aussi trouv'ees dans le niveau inf'erieur. De cette fa¸con chaque niveau fait correspondre des adresses d'un grand ensemble de m'emoire dans un plus petit ensemble plus rapide d'acc`es et plus haut dans la hi'erarchie. ..."
Abstract
- Add to MetaCart
us les uns dans les autres dans le sens o`u toutes les donn'ees d'un niveau sont aussi trouv'ees dans le niveau inf'erieur. De cette fa¸con chaque niveau fait correspondre des adresses d'un grand ensemble de m'emoire dans un plus petit ensemble plus rapide d'acc`es et plus haut dans la hi'erarchie. L'objectif de ce travail et d'tudier le comportement du premier niveau de la hirarchie mmoire : le cache. Cette tude se fera par la simulation de ce dernier. 1 Le Cache Le fonctionnement d'un cache est illustr sur la figure 1. La mmoire est divise en petit bloc de taille fixe (gnralement quelques centaines d'octets). Le cache, de taille nettement plus petite que la mmoire, contient des copies de certains blocs de la mmoire. Lorsque le processeur accde la mmoire ( travers une adresse), il passe par le gestionnaire du cache. Si la donne est prsente dans le cache (cache hit), celle ci est immdiatement retourne. Sinon, on est en prsence d'un<F22.62
DETERMINISTIC FEATURES IN THE EVOLUTION OF MICROPROCESSORS
"... The fierce demand for higher performance has provoked a dramatic evolution in the field of microprocessors. In this paper we show that this immense performance increase could only be achieved by the subsequent introduction of temporal, issue and intrainstruction parallelism, in such a way that explo ..."
Abstract
- Add to MetaCart
The fierce demand for higher performance has provoked a dramatic evolution in the field of microprocessors. In this paper we show that this immense performance increase could only be achieved by the subsequent introduction of temporal, issue and intrainstruction parallelism, in such a way that exploiting the full potential along one dimension gives rise to the additional introduction of parallelism along a further dimension. Moreover, the debut of each basic technique used to implement parallelism along a given dimension inevitably calls for the introduction of further innovative techniques in order to fully capitalize on the potential of the basic technique. In this way an underlying deterministic framework can be identified for the fascinating evolution of microprocessors, which is presented in our paper.
RUN-TIME OPTIMIZATION ARCHITECTURE
, 2002
"... Each new generation of wide-issue processors continues to achieve higher performance by exploiting greater amounts of instruction-level parallelism than the previous generation. Dynamic techniques such as out-of-order execution with hardware speculation have proven effective at increasing instructio ..."
Abstract
- Add to MetaCart
Each new generation of wide-issue processors continues to achieve higher performance by exploiting greater amounts of instruction-level parallelism than the previous generation. Dynamic techniques such as out-of-order execution with hardware speculation have proven effective at increasing instruction throughput, parallelism, and utilization of processor resources. Run-time optimization techniques promise to enable an even higher level of performance by applying aggressive transformations at run-time that optimize across module boundaries, adapt code regions to changing input patterns, and customize code sequences for the underlying microarchitecture. This thesis presents a hardware mechanism for generating and deploying run-time optimized code. The system exploits program execution phasing by automatically detecting and optimizing the instruction sequences that comprise the phase, called a hot spot. The hardware mechanism can be viewed as a filtering system that resides after the retirement stage of the processor pipeline, accepts an instruction execution stream as input, and produces instruction profiles and sets of linked, optimized traces as output. The code deployment mechanism uses an extension to the branch prediction mechanism to migrate execution into the new code without modifying the original code. These new components do not add delay to the execution of
Algorithm, Proof and Performances of a new Division of Floating Point Expansions
, 1999
"... We present in this work a new algorithm for the division of oating point expansions. Floating expansion is a multiple precision data type developped with arithmetic operators that use the processor oating point unit for core computations instead of the integer unit. Researches on this subject have ..."
Abstract
- Add to MetaCart
We present in this work a new algorithm for the division of oating point expansions. Floating expansion is a multiple precision data type developped with arithmetic operators that use the processor oating point unit for core computations instead of the integer unit. Researches on this subject have arised recently from the observation that the oating point unit becomes a more and more ecient part of modern computers. Many simple arithmetic operators and some very usefull geometric operators have already been presented on expansions. Yet previous work presented only a very simple division algorithm. We present in this work a new algorithm. We take this opportunity to extend the set of geometric operators with Bareiss' determinant on a matrix of size between 3 and 10. Running times with dierent determinant algorithms on dierent machines are compared with other multiprecision packages including GMP, CADNA and a computer geometry package working with modular arithmetic.

