Results 11 -
11 of
11
The Good Block: Hardware/Software Design for Composable, Block-Atomic Processors
"... Power consumption, complexity, and on-chip latency are forcing computer systems to exploit more parallelism efficiently. Explicit Dataflow Graph Execution (EDGE) architectures seek to expose parallelism by dividing programs into blocks of efficient dataflow operations, exposing inter and intra-block ..."
Abstract
- Add to MetaCart
Power consumption, complexity, and on-chip latency are forcing computer systems to exploit more parallelism efficiently. Explicit Dataflow Graph Execution (EDGE) architectures seek to expose parallelism by dividing programs into blocks of efficient dataflow operations, exposing inter and intra-block concurrency. This paper studies the balance of complexity and capability between EDGE architectures and compilers. We address three main questions. (1) What are the appropriate block granularities for achieving high performance efficiently? (2) What are good block instruction selection policies? (3) What architecture and compiler support do these designs require? Our results show that the compiler requires multiple block sizes to adapt applications to block-atomic hardware and achieve high performance. Although the architecture for a single size is simpler, the additions for variable sizes are modest and ease hardware configuration. We propose hand-crafted and learned compiler policies for block formation. We find the best policies provide significant advantages of up to a factor of 3 in some configurations. Policies vary based on (1) the amount of parallelism inherent in the application, e.g., for integer and numerical applications, and (2) the available parallel resources. The resulting configurable architecture and compiler efficiently expose and exploit software and hardware parallelism. 1.

