Results 11 - 20
of
60
Dynamic Hardware/Software Partitioning: A First Approach
, 2003
"... Partitioning an application among software running on a microprocessor and hardware co-processors in on-chip configurable logic has been shown to improve performance and energy consumption in embedded systems. Meanwhile, dynamic software optimization methods have shown the usefulness and feasibility ..."
Abstract
-
Cited by 27 (10 self)
- Add to MetaCart
Partitioning an application among software running on a microprocessor and hardware co-processors in on-chip configurable logic has been shown to improve performance and energy consumption in embedded systems. Meanwhile, dynamic software optimization methods have shown the usefulness and feasibility of runtime program optimization, but those optimizations do not achieve as much as partitioning. We introduce a first approach to dynamic hardware/software partitioning. We describe our system architecture and initial onchip tools, including profiler, decompiler, synthesis, and placement and routing tools for a simplified configurable logic fabric, able to perform dynamic partitioning of real benchmarks. We show speedups averaging 2.6 for five benchmarks taken from Powerstone, NetBench, and our own benchmarks.
Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy Reduction
- In Proceedings of the 34th Annual International Symposium on Micro-architecture
, 2001
"... The mobile computing device market is projected to grow to 16.8 million units in 2004, representing an average annual growth rate of 28% over the five year forecast period [5]. This brings the technologies that optimize system energy to the forefront. As circuits continue to scale in future, it woul ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
The mobile computing device market is projected to grow to 16.8 million units in 2004, representing an average annual growth rate of 28% over the five year forecast period [5]. This brings the technologies that optimize system energy to the forefront. As circuits continue to scale in future, it would be important to optimize both leakage and dynamic energy. Effective optimization of leakage and dynamic energy consumption requires a vertical integration of techniques spanning from circuit to software levels.
Considering Power Variations of DVS Processing Elements for Energy Minimisation in Distributed Systems
, 2001
"... Dynamic voltage scaling (DVS) is a powerful technique to reduce power dissipation in embedded systems. Some efficient DVS algorithms have been recently proposed for the energy reduction in distributed system. However, they achieve the energy savings solely by scaling the system task with respect to ..."
Abstract
-
Cited by 26 (10 self)
- Add to MetaCart
Dynamic voltage scaling (DVS) is a powerful technique to reduce power dissipation in embedded systems. Some efficient DVS algorithms have been recently proposed for the energy reduction in distributed system. However, they achieve the energy savings solely by scaling the system task with respect to the timing constraints, while neglecting that power varies among the tasks executed by DVS processing elements (DVS-PEs). In this paper we investigate the problem of considering DVS-PE power variations dependent on the executed tasks, during the synthesis of distributed embedded systems and its impact on the energy savings. Unlike previous approaches, which minimise the energy consumption by exploiting the available slack time without considering the PE power profiles, a new and fast heuristic for the voltage scaling problem is proposed, which improves the voltage selection for each task dependent on the individual power dissipation caused by that task. Experimental results show that energy reductions with up to 80.7% are achieved by integrating the proposed DVS algorithm, which considers the PE power profiles, into the co-synthesis of distributed systems.
Walkabout - A Retargetable Dynamic Binary Translation Framework
- In Proceedings of the 2002 Workshop on Binary Translation
, 2002
"... Dynamic compilation techniques have found a renaissance in recent years due to their use in high-performance implementations of the Java(TM) language. Techniques originally developed for use in virtual machines for such object-oriented languages as Smalltalk are now commonly used in Java virtual mac ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Dynamic compilation techniques have found a renaissance in recent years due to their use in high-performance implementations of the Java(TM) language. Techniques originally developed for use in virtual machines for such object-oriented languages as Smalltalk are now commonly used in Java virtual machines (JVM(TM)) and Java just-in-time compilers. These techniques have also been applied to binary translation in recent years, most commonly appearing in binary optimizers for a given platform that improve the performance of binary programs while they execute.
Performance Characterization of a Hardware Mechanism for Dynamic Optimization
- In 34 th International Symposium on Microarchitecture
, 2001
"... We evaluate the rePLay microarchitecture as a means for reducing application execution time by facilitating dynamic optimization. The framework contains a programmable optimization engine coupled with a hardware-based recovery mechanism. The optimization engine enables the dynamic optimizer to run c ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
We evaluate the rePLay microarchitecture as a means for reducing application execution time by facilitating dynamic optimization. The framework contains a programmable optimization engine coupled with a hardware-based recovery mechanism. The optimization engine enables the dynamic optimizer to run concurrently with program execution. The recovery mechanism enables the optimizer to make speculative optimizations without requiring recovery code.
LLVA: A Low-level Virtual Instruction Set Architecture
- IN MICRO-36
, 2003
"... A virtual instruction set architecture (V-ISA) implemented via a processor-specific software translation layer can provide great flexibility to processor designers. Recent examples such as Crusoe and DAISY, however, have used existing hardware instruction sets as virtual ISAs, which complicates tran ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
A virtual instruction set architecture (V-ISA) implemented via a processor-specific software translation layer can provide great flexibility to processor designers. Recent examples such as Crusoe and DAISY, however, have used existing hardware instruction sets as virtual ISAs, which complicates translation and optimization. In fact, there has been little research on specific designs for a virtual ISA for processors. This paper proposes a novel virtual ISA (LLVA) and a translation strategy for implementing it on arbitrary hardware. The instruction set is typed, uses an infinite virtual register set in Static Single Assignment form, and provides explicit control-flow and dataflow information, and yet uses low-level operations closely matched to traditional hardware. It includes novel mechanisms to allow more flexible optimization of native code, including a flexible exception model and minor constraints on self-modifying code. We propose a translation strategy that enables offline translation and transparent offline caching of native code and profile information, while remaining completely OS-independent. It also supports optimizations directly on the representation at install-time, runtime, and offline between executions. We show experimentally that the virtual ISA is compact, it is closely matched to ordinary hardware instruction sets, and permits very fast code generation, yet has enough high-level information to permit sophisticated program analyses and optimizations.
Hardware/Software Partitioning of Software Binaries
, 2002
"... Partitioning an embedded system application among a microprocessor and custom hardware has been shown to improve the performance, power or energy of numerous examples. The advent of single-chip microprocessor/FPGA platforms makes such partitioning even more attractive. Previous partitioning approach ..."
Abstract
-
Cited by 20 (13 self)
- Add to MetaCart
Partitioning an embedded system application among a microprocessor and custom hardware has been shown to improve the performance, power or energy of numerous examples. The advent of single-chip microprocessor/FPGA platforms makes such partitioning even more attractive. Previous partitioning approaches have partitioned sequential program source code, such as C or C++. We introduce a new approach that partitions at the software binary level. Although source code partitioning is preferable from a purely technical viewpoint, binary-level partitioning provides several very practical benefits for commercial acceptance. We demonstrate that binary-level partitioning yields competitive speedup results compared to source-level partitioning, achieving an average speedup of 1.4 compared to 1.5 for eight benchmarks partitioned on a single-chip microprocessor/FPGA device.
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning
, 2004
"... In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic -- an approach we call warp processing. The configu ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic -- an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In contrast, dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.
Binary translation and architecture convergence issues for IBM System/390
- In Proc. of the International Conference on Supercomputing 2000, Santa Fe, NM
, 2000
"... We describe the design issues in an implementation of the ESA/390 architecture based on binary translation to a very long instruction word (VLIW) processor. During binary translation, complex ESA/390 instructions are decomposed into instruction “primitives ” which are then scheduled onto a wide-issu ..."
Abstract
-
Cited by 10 (8 self)
- Add to MetaCart
We describe the design issues in an implementation of the ESA/390 architecture based on binary translation to a very long instruction word (VLIW) processor. During binary translation, complex ESA/390 instructions are decomposed into instruction “primitives ” which are then scheduled onto a wide-issue machine. The aim is to achieve high instruction level parallelism due to the increased scheduling and optimization opportunities which can be exploited by binary translation software, combined with the efficiency of long instruction word architectures. A further aim is to study the feasibility of a common execution platform for different instruction set architectures, such as ESA/390, RS/6000, AS/400 and the Java Virtual Machine, so that multiple systems can be built around a common execution platform. 1.
Softspec: Software-based Speculative Parallelism
- In 3rd ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-3
, 2000
"... We present Softspec, a technique for parallelizing sequential applications using a hybrid compile-time and run-time technique. Softspec parallelizes loops whose memory references are stride-predictable. By detecting and speculatively executing potential parallelism at runtime Softspec eliminates the ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We present Softspec, a technique for parallelizing sequential applications using a hybrid compile-time and run-time technique. Softspec parallelizes loops whose memory references are stride-predictable. By detecting and speculatively executing potential parallelism at runtime Softspec eliminates the need for complex program analysis required by parallelizing compilers. By using runtime information Softspec succeeds in parallelizing loops whose memory access patterns are statically indeterminable. For example, Softspec can parallelize while loops with un-analyzable exit conditions, linked list traversals, and sparse matrix applications with predictable memory patterns. We show performance results using our software prototype implementation.

