Results 1 - 10
of
131
Thread-Sensitive Modulo Scheduling for Multicore Processors ∗
"... This paper describes a generalisation of modulo scheduling to parallelise loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-level parallelism while preserving the simplicity and effectiveness of modulo scheduling. Our generalisation is simple, drops ..."
Abstract
- Add to MetaCart
This paper describes a generalisation of modulo scheduling to parallelise loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-level parallelism while preserving the simplicity and effectiveness of modulo scheduling. Our generalisation is simple
Reuse-Aware Modulo Scheduling for Stream Processors
"... Abstract—This paper presents reuse-aware modulo scheduling to maximizing stream reuse and improving concurrency for stream-level loops running on stream processors. The novelty lies in the development of a new representation for an unrolled and software-pipelined stream-level loop using a set of reu ..."
Abstract
- Add to MetaCart
Abstract—This paper presents reuse-aware modulo scheduling to maximizing stream reuse and improving concurrency for stream-level loops running on stream processors. The novelty lies in the development of a new representation for an unrolled and software-pipelined stream-level loop using a set
Distributed Modulo Scheduling
, 1999
"... Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler. However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required by a ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler. However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required
Modulo Scheduling With Isomorphic Control Transformations
, 1994
"... ... over other software pipelining techniques based on global scheduling. The ICTs are applied to Modulo Scheduling to schedule loops with conditional branches. Experimental results show that this approach allows more flexible scheduling and thus better performance than Modulo Scheduling with Hierar ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
with Hierarchical Reduction. Modulo Scheduling with ICTs targets processors with no or limited support for conditional execution such as superscalar processors. However, in processors that do not require instruction set compatibility, support for Predicated Execution can be used. This dissertation shows that Modulo
A unified modulo scheduling and register allocation technique for clustered processors
- In Proceedings of the 10th International Conference on Parallel Architectures and Compilation Techniques
, 2001
"... This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or
Unrolling-Based Optimizations for Modulo Scheduling
- in Proceedings of the 28th International Symposium on Microarchitecture
, 1995
"... Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern, and the reg ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern
Modulo Schedule Buffers
- In Proc. of the 34th Annual International Symposium on Microarchitecture
, 2001
"... As VLIW/EPIC processors are increasingly used in realtime, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface fo ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
As VLIW/EPIC processors are increasingly used in realtime, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface
Register constrained modulo scheduling
- IEEE Trans. Parallel Distrib. Syst
"... Abstract—Software pipelining is an instruction scheduling technique that exploits the instruction level parallelism (ILP) available in loops by overlapping operations from various successive loop iterations. The main drawback of aggressive software pipelining techniques is their high register requir ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
propose a set of heuristics to improve the spilling process and to better decide between adding spill code or directly decreasing the execution rate of iterations. The experimental evaluation, over a large number of representative loops and for a processor configuration, reports an increase in performance
Preprocessing Strategy for Effective Modulo Scheduling on Multi-Issue Digital Signal Processors
"... Abstract. To achieve high resource utilization for multi-issue Digital Signal Processors (DSPs), production compilers commonly include variants of the iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, often pre ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. To achieve high resource utilization for multi-issue Digital Signal Processors (DSPs), production compilers commonly include variants of the iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, often
A Fault-Tolerant Permutation Network Modulo Arithmetic Processor
- IEEE Trans. on VLSI Systems
, 1994
"... Abstract-Conventional fault-tolerant modulo arithmetic processors rely on the properties of a residue number system with L redundant moduli to detect up to L / 2 errors. In this paper, we propose a new scheme that combines r-out-of-s residue codes with Berger codes to concurrently detect any number ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract-Conventional fault-tolerant modulo arithmetic processors rely on the properties of a residue number system with L redundant moduli to detect up to L / 2 errors. In this paper, we propose a new scheme that combines r-out-of-s residue codes with Berger codes to concurrently detect any
Results 1 - 10
of
131