Results 1 -
4 of
4
Iterative modulo scheduling: An algorithm for software pipelining loops
- In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract
-
Cited by 263 (2 self)
- Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Hardware-software co-design of embedded systems
- PROCEEDINGS OF THE IEEE
, 1994
"... This paper surveys the design of embedded computer systems, which use software running on programmable computers to im-plement system functions. Creating an embedded computer system which meets its performance, cost, and design time goals is a hardware-software co-design problewhe design of the hard ..."
Abstract
-
Cited by 145 (5 self)
- Add to MetaCart
This paper surveys the design of embedded computer systems, which use software running on programmable computers to im-plement system functions. Creating an embedded computer system which meets its performance, cost, and design time goals is a hardware-software co-design problewhe design of the hard-ware and software components influence each other. This paper emphasizes a historical approach to show the relationships be-tween well-understood design problems and the as-yet unsolved problems in co-design. We describe the relationship between hard-ware and sofhvare architecture in the early stages of embedded system design. We describe analysis techniques for hardware and software relevant to the architectural choices required for hard-ware-software co-design. We also describe design and synthesis techniques for co-design and related problems.
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract
-
Cited by 64 (11 self)
- Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compile-time heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamic-level scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graph-analysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.

