Results 11 -
16 of
16
Meld Scheduling: A Technique for Relaxing Scheduling Constraints
- Int. J. Parallel Programming
, 1998
"... Meld scheduling melds the schedules of neighboring scheduling regions to respect latencies of operations issued in one region but completing after control transfers to the other. In contrast, conventional schedulers ignore latency constraints from other regions leading to potentially avoidable stall ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Meld scheduling melds the schedules of neighboring scheduling regions to respect latencies of operations issued in one region but completing after control transfers to the other. In contrast, conventional schedulers ignore latency constraints from other regions leading to potentially avoidable stalls in an interlocked (superscalar) machine or incorrect schedules for non-interlocked (VLIW) machines. Alternatively, schedulers that conservatively require all operations to complete before the branch takes effect produce inefficient schedules.
Exploiting Conditional Instructions in Code Generation for Embedded VLIW Processors
- Processors,” in Design, Automation, and Test in Europe
, 1999
"... This paper presents a new code optimization technique for a class of embedded processors. Modern embedded processor architectures show deep instruction pipelines and highly parallel VLIW-like instruction sets. For such architectures, any change in the control flow of a machine program due to a condi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a new code optimization technique for a class of embedded processors. Modern embedded processor architectures show deep instruction pipelines and highly parallel VLIW-like instruction sets. For such architectures, any change in the control flow of a machine program due to a conditional jump may cause a significant code performance penalty. Therefore, the instruction sets of recent VLIW machines offer support for branch-free execution of conditional statements in the form of so-called conditional instructions. Whether an if-then-else statement is implemented by a conditional jump scheme or by conditional instructions has a strong impact on its worst-case execution time. However, the optimal selection is difficult particularly for nested conditionals. We present a dynamic programming technique for selecting the fastest implementation for nested if-then-else statements based on estimations. The efficacy is demonstrated for a real-life VLIW DSP. 1 1 Introduction A maj...
an overview of the architecture
- In Proceedings 2nd International Conference on Databases: Improving demic
, 1981
"... Garp’s on-chip, reconfigurable coprocessor was tailored specifically for accelerating loops of general-purpose software applications. Its novel features inspired a unique approach to automatic compilation from C. Various projects and products have been built using off-the-shelf field-programmable ga ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Garp’s on-chip, reconfigurable coprocessor was tailored specifically for accelerating loops of general-purpose software applications. Its novel features inspired a unique approach to automatic compilation from C. Various projects and products have been built using off-the-shelf field-programmable gate arrays (FPGAs) as compute accelerators for specific tasks. Such systems typically connect one or more FPGAs to the host computer via an I/O bus. Some have shown remarkable speedups, albeit limited to specific application domains. Many factors limit the general usefulness of such systems. Long reconfiguration times prevent acceleration of applications that spread their time over many different tasks. Low-bandwidth paths for data transfer limit
iDynamic Optimization Infrastructure and Algorithms for IA-64,j Master of Science thesis
, 2000
"... Dynamic optimization refers to any program optimization performed after the initial static compile time. While typically not designed as a replacement for static optimization, dynamic optimization is a complementary optimization opportunity that leverages a vast amount of information that is not ava ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Dynamic optimization refers to any program optimization performed after the initial static compile time. While typically not designed as a replacement for static optimization, dynamic optimization is a complementary optimization opportunity that leverages a vast amount of information that is not available until runtime. Dynamic optimization opens the doors for machine and user-specific optimizations without the need for original source code. This thesis includes three contributions to the field of dynamic optimization. The first main goal is the survey of several current approaches to dynamic optimization, as well as its related topics of dynamic compilation, the postponement of some or all of compilation until runtime, and dynamic translation, the translation of an executable from one instruction-set architecture (ISA) to another. The second major goal of this thesis is the proposal of a new infrastructure for dynamic optimization in EPIC architectures. Several salient features of the EPIC ISA prove it to be not only a good candidate for dynamic optimization, but such optimizations are essential for scalability that is up to par with superscalar processors. By extending many of the existing approaches to dynamic optimization to allow for offline optimization, a new dynamic optimization system is proposed for EPIC architectures. For compatibility reasons, this new system is almost entirely a software-based solution, yet it utilizes the hardware-based profiling counters planned for future EPIC processors. Finally, the third contribution of this thesis is the introduction of several original optimization algorithms, which are specifically designed for implementation in a dynamic optimization infrastructure. Dynamic if-conversion is a lightweight runtime algorithm that converts control dependencies to data dependencies and vice versa at runtime, based on branch misprediction rates, that achieves a speedup of up to 17 % for the SpecInt95 benchmarks. Several other algorithms, such as predicate profiling, predicate promotion and false predicate path collapse are designed to aid in offline instruction rescheduling.
Three papers will be discussed in this section: Effective Compiler Support for Predicated Execution Using the Hyperblock, by Mahlke et al.; Disjoint Eager Execution: An Optimal Form of Speculative Execution, by Uht and Singdagi; and Selective Eager Execut
, 2000
"... close out the “processor front-end ” section of the course. We will examine each of the papers in order. 2 Effective Compiler Support for Predicated Execution Using the Hyperblock A hyperblock is a set of predicated basic blocks in which control may only enter from the top, but may exit from one or ..."
Abstract
- Add to MetaCart
close out the “processor front-end ” section of the course. We will examine each of the papers in order. 2 Effective Compiler Support for Predicated Execution Using the Hyperblock A hyperblock is a set of predicated basic blocks in which control may only enter from the top, but may exit from one or more locations [1]. We note that this implies that hyperblocks only have one entry point, and have no loops within them. This is significant since, if this were not the case, instructions within hyperblocks could not be arbitrarily reordered. The hyperblock is effectively a superblock with predicated instructions. The architecture on which the studies were performed is illustrated on page 3 of [1]. The central architectural element of interest is the predicate register file. The two bits in each predicate denote the true and false values of the predicate. These values will then be used to determine if instructions, predicated on these values will be execute or not. We note that the true and false values of a given predicate, are not necessarily complements of each other. This is done to support predication down nested branches.
Applying Scalable Interprocedural Pointer Analysis to Embedded Applications
"... This paper evaluates six different types of interprocedural pointer analyses on 22 telecommunication and media applications and describes their application to an SRAM power reduction technique. This configurable SRAM provides differentiation of data access time and port counts within a single on-chi ..."
Abstract
- Add to MetaCart
This paper evaluates six different types of interprocedural pointer analyses on 22 telecommunication and media applications and describes their application to an SRAM power reduction technique. This configurable SRAM provides differentiation of data access time and port counts within a single on-chip structure. Scheduling for configurable SRAM relies on inter-procedural dependence analysis for safe assignment of program data objects to on-chip storage regions with little or no performance degradation. It thus provides a meaningful vehicle for exploring the applicability of and need for various interprocedural analysis techniques, as well as a demonstration of how compilation and hardware techniques can be combined to achieve optimization beyond that for performance. 1.

