Results 1 - 10
of
655
Code Positioning for VLIW Architectures
"... Several studies have considered reducing instruction cache misses and branch penalty stall cycles by means of various forms of code placement. Most proposed approaches rearrange procedures or basic blocks in order to speed up execution on sequential architectures with branch prediction. Moreover, ..."
Abstract
- Add to MetaCart
Several studies have considered reducing instruction cache misses and branch penalty stall cycles by means of various forms of code placement. Most proposed approaches rearrange procedures or basic blocks in order to speed up execution on sequential architectures with branch prediction. Moreover
Execution-based Scheduling for VLIW Architectures
- In Euro-Par ’99 Parallel Processing – 5th International Euro-Par Conference, number 1685 in Lecture Notes in Computer Science
, 1999
"... We describe a new dynamic software scheduling technique for VLIW architectures, which compiles into VLIW code the program paths that are actually executed. Unlike trace processors, or DIF, the technique executes operations speculatively on multiple paths through the code, is resilient to branch ..."
Abstract
-
Cited by 22 (17 self)
- Add to MetaCart
We describe a new dynamic software scheduling technique for VLIW architectures, which compiles into VLIW code the program paths that are actually executed. Unlike trace processors, or DIF, the technique executes operations speculatively on multiple paths through the code, is resilient to branch
Instruction scheduling for clustered vliw architectures
- in ISSS ’00: Proceedings of the 13th international symposium on System synthesis
, 2000
"... Abstract ..."
Partitioned Schedules for Clustered VLIW Architectures
- In Proc., 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP'1998
, 1998
"... This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machine. A distinctive characteristic of this architecture is the use of register files organized by means of queues, which res ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machine. A distinctive characteristic of this architecture is the use of register files organized by means of queues, which
Dynamically Trace Scheduled VLIW Architectures
- PROC. OF THE HPCN’98, IN LECTURE NOTES ON COMPUTER SCIENCE
, 1998
"... This paper presents a new architecture organisation, the dynamically trace scheduled VLIW (DTSVLIW), that can be used to implement machines that execute the code of current RISC or CISC instruction set architectures in a VLIW fashion, with backward code compatibility. ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
This paper presents a new architecture organisation, the dynamically trace scheduled VLIW (DTSVLIW), that can be used to implement machines that execute the code of current RISC or CISC instruction set architectures in a VLIW fashion, with backward code compatibility.
Clustered VLIW Architectures with Predicated Switching
- In Proceedings of the Design Automation Conference, 2001
, 2001
"... In order to meet the high throughput requirements of applications exhibiting high ILP, VLIW ASIPs may increasingly include large numbers of functional units(FUs). Unfortunately, `switching ' data through register les shared by large numbers of FUs quickly becomes a dominant cost / performance f ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In order to meet the high throughput requirements of applications exhibiting high ILP, VLIW ASIPs may increasingly include large numbers of functional units(FUs). Unfortunately, `switching ' data through register les shared by large numbers of FUs quickly becomes a dominant cost / performance
A VLIW Architecture for Logarithmic Arithmetic
, 2003
"... The Logarithmic Number System (LNS) is an alternative to IEEE-754 standard floating-point arithmetic. LNS multiply, divide and square root are easier than IEEE-754 and naturally belong to the same class of one-cycle-latency instructions like integer addition, subtraction and shifting. LNS addition ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The Logarithmic Number System (LNS) is an alternative to IEEE-754 standard floating-point arithmetic. LNS multiply, divide and square root are easier than IEEE-754 and naturally belong to the same class of one-cycle-latency instructions like integer addition, subtraction and shifting. LNS addition
Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings
- Proc. 29 th Ann. Int’l Symp. Microarchitecture (MICRO29
, 1996
"... VLIW architectures use very wide instruction words in conjunction with high bandwidth to the instruction cache to achieve multiple instruction issue. This report uses the TINKER experimental testbed to examine instruction fetch and instruction cache mechanisms for VLIWs. A compressed instruction enc ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
VLIW architectures use very wide instruction words in conjunction with high bandwidth to the instruction cache to achieve multiple instruction issue. This report uses the TINKER experimental testbed to examine instruction fetch and instruction cache mechanisms for VLIWs. A compressed instruction
Code Generation for Processors with VLIW Architecture
, 1998
"... Increasing complexity and modern standards for wireless multimedia applications require high performance data processing. A basic challenge is to reduce the development time of systems with growing complexity. Recently, new research efforts emerged on the edge between hardware and software system ..."
Abstract
- Add to MetaCart
Increasing complexity and modern standards for wireless multimedia applications require high performance data processing. A basic challenge is to reduce the development time of systems with growing complexity. Recently, new research efforts emerged on the edge between hardware and software system design, to develop high-quality code generation tools.
Loop Fusion for Clustered VLIW Architectures
- In Proceedings of the joint conference on Languages, compilers and tools for embedded systems (LCTES/SCOPES ’02
, 2002
"... Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level parallelism (ILP), and thus significantly improve pe ..."
Abstract
- Add to MetaCart
Embedded systems require maximum performance from a processor within significant constraints in power consumption and chip cost. Using software pipelining, high-performance digital signal processors can often exploit considerable instruction-level parallelism (ILP), and thus significantly improve performance. However, software pipelining, in some instances, hinders the goals of low power consumption and low chip cost. Specifically, the registers required by a software pipelined loop may exceed the size of the physical register set.
Results 1 - 10
of
655