Results 1 -
5 of
5
eMIPS, A Dynamically Extensible Processor
, 2006
"... The eMIPS architecture can realize the performance benefits of application-specific hardware optimizations in a general-purpose, multi-user system environment using a dynamically extensible processor architecture. It allows multiple secure Extensions to load dynamically and to plug into the stages o ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The eMIPS architecture can realize the performance benefits of application-specific hardware optimizations in a general-purpose, multi-user system environment using a dynamically extensible processor architecture. It allows multiple secure Extensions to load dynamically and to plug into the stages of a pipelined data path, thereby extending the core instruction set of the microprocessor. Extensions can also be used to realize on-chip peripherals and if area permits even multiple cores. The new functionality can be exploited by patching the binaries of the existing applications, without requiring any changes to the compilers. A working FPGA prototype and a flexible simulation system demonstrate speedups of 2x-3x on a set of applications that include games, realtime programs and the SPEC2000 integer benchmarks. eMIPS is the first realized workstation based entirely on a dynamically extensible processor that is safe for general purpose, multi-user applications. By exposing the individual stages of the data path, eMIPS allows optimizations not previously possible. This includes permitting safe and coherent accesses to memory from within an Extension, optimizing multi-branched blocks, and throwing precise and restartable exceptions from within an Extension. 1
N.: Resource sharing in custom instruction set extensions
- In: Proceedings of the 6th IEEE Symposium on Application Specific Processors
, 2008
"... Abstract—Customised processor performance generally increases as additional custom instructions are added. However, performance is not the only metric that modern systems must take into account; die area and energy efficiency are equally important. Resource sharing during synthesis of instruction se ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Customised processor performance generally increases as additional custom instructions are added. However, performance is not the only metric that modern systems must take into account; die area and energy efficiency are equally important. Resource sharing during synthesis of instruction set extensions (ISEs) can reduce significantly the die area and energy consumption of a customized processor. This may increase the number of custom instructions that can be synthesized with a given area budget. Resource sharing involves combining the graph representations of two or more ISEs which contain a similar sub-graph. This coupling of multiple sub-graphs, if performed naively, can increase the latency of the extension instructions considerably. And yet, as we show in this paper, an appropriate level of resource sharing provides a significantly simpler design with only modest increases in average latency for extension instructions. Based on existing resource-sharing techniques, this study presents a new heuristic that controls the degree of resource sharing between a given set of custom instructions. Our main contributions are the introduction of a parametric method for exploring the trade-offs that can be achieved between instruction latency and implementation complexity, and the coupling of design-space exploration with fast area-delay models for the operators comprising each ISE. We present experimental evidence that our heuristic exposes a broad range of design points, allowing advantageous trade-offs between die area and latency to be found and exploited. I.
Automating the Design of Embedded Domain Specific Accelerators
, 2008
"... Domain specific architecture (DSA) design currently involves a lengthy process that requires significant designer knowledge, experience, and time in arriving at a suitable code generator and architecture for the target application suite. Given the stringent time to market constraints and the dynamic ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Domain specific architecture (DSA) design currently involves a lengthy process that requires significant designer knowledge, experience, and time in arriving at a suitable code generator and architecture for the target application suite. Given the stringent time to market constraints and the dynamic nature of embedded applications, designers face a huge challenge in delivering high performance yet energy efficient devices. In this study, we investigate an automatic design space exploration tool that employs an iterative technique known as “Stall Cycle analysis ” (SCA) to arrive at near-optimal energy-performance designs for various constraints,e.g., minimum area. For each design candidate in the process, the results of code generation and simulation are analyzed to identify bottlenecks to performance (or energy) and provide insight into adding or removing resources for further improvements. Second, we demonstrate the utility of exploration in pruning the design space effectively (from ≥1000 points to tens of points) for three application domains: face recognition, speech recognition, and wireless telephony. As compared to manual designs optimized for a particular metric, SCA automates the design of DSAs for minimum energydelay product (17 % improvement for wireless telephony), minimum area (75 % smaller design for face recognition), or maximum performance (38 % improvement for speech recognition). Finally, we discuss the impact of per design code generation in reducing DSA design time from man-months to hours and in identifying superior design points through architectural design space exploration. 1
Graduate Group ChairpersonCOPYRIGHT
, 2008
"... “It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had every ..."
Abstract
- Add to MetaCart
“It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us.... ” – Charles Dickens Enough of the worst! Let me reflect only on the best. My graduate career led me to the best of all possible friends, professors, and colleagues. My mind is sharper, my life richer, and my heart fuller thanks to the years I spent at Penn, learning, living, and loving more than any stretch of time prior to graduate school. I thank Amir Roth, whose passion for teaching and computer architecture converted me from a forced pupil in a required class to an inspired, budding researcher with a thirst for more knowledge and an empowering feeling that anything is possible in hardware. You have been a father and a friend. Thank you for setting the bar high and then helping me reach it. I thank my committee chair, Milo Martin, for always having an open door and
Ecole Polytechnique Federale de Lausanne
"... The Field Programmable Compressor Tree (FPCT) is a programmable compressor tree (e.g., a Wallace or Dadda Tree) intended for integration in an FPGA or other reconfigurable device. This paper presents a design space exploration (DSE) method that can be used to identify the best FPCT architecture for ..."
Abstract
- Add to MetaCart
The Field Programmable Compressor Tree (FPCT) is a programmable compressor tree (e.g., a Wallace or Dadda Tree) intended for integration in an FPGA or other reconfigurable device. This paper presents a design space exploration (DSE) method that can be used to identify the best FPCT architecture for a given set of arithmetic benchmark circuits; in practice, an FPGA vendor can use the design space exploration to tailor the FPCT to meet the needs of the most important benchmark circuits of the vendor’s largest-volume clients. One novel feature of the DSE is the introduction of a metric called I/O utilization; we found that I/O utilization has a strong correlation with both the critical path delay and area of the benchmark circuits under study. Pruning the search space using I/O utilization allowed us to reduce significantly the number of FPCTs that must be synthesized and evaluated during the DSE, while giving high confidence that the best architectures are still explored. The DSE was applied to seven small-to-medium range benchmark circuits; one FPCT architecture was found that was 30 % faster than the second best in terms of critical path delay, and only 3.34 % larger than the smallest.

