Results 1 -
9 of
9
StreamIt: A Language for Streaming Applications
- In International Conference on Compiler Construction
, 2001
"... We characterize high-performance streaming applications as a new and distinct domain of programs that is becoming increasingly important. ..."
Abstract
-
Cited by 235 (24 self)
- Add to MetaCart
We characterize high-performance streaming applications as a new and distinct domain of programs that is becoming increasingly important.
A Stream Compiler for Communication-Exposed Architectures
- In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2002
"... With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, iWa ..."
Abstract
-
Cited by 61 (16 self)
- Add to MetaCart
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, iWarp, SmartMemories). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wireexposed architectures.
RPU: A Programmable Ray Processing Unit for Realtime Ray Tracing
- ACM Trans. Graph
, 2005
"... with shadows and refractions), a Conference room (5.5 fps, without shadows), reflective and refractive Spheres-RT in an office (4.5 fps), and UT2003 a scene from a current computer game (7.5 fps, precomputed illumination). Recursive ray tracing is a simple yet powerful and general approach for accur ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
with shadows and refractions), a Conference room (5.5 fps, without shadows), reflective and refractive Spheres-RT in an office (4.5 fps), and UT2003 a scene from a current computer game (7.5 fps, precomputed illumination). Recursive ray tracing is a simple yet powerful and general approach for accurately computing global light transport and rendering high quality images. While recent algorithmic improvements and optimized parallel software implementations have increased ray tracing performance to realtime levels, no compact and programmable hardware solution has been available yet. This paper describes the architecture and a prototype implementation of a single chip, fully programmable Ray Processing Unit (RPU). It combines the flexibility of general purpose CPUs with the efficiency of current GPUs for data parallel computations. This design allows for realtime ray tracing of dynamic scenes with programmable material, geometry, and illumination shaders. Although, running at only 66 MHz the prototype FPGA implementation already renders images at up to 20 frames per second, which in many cases beats the performance of highly optimized software running on multi-GHz desktop CPUs. The performance and efficiency of the proposed architecture is analyzed using a variety of benchmark scenes.
StreamIt: A Compiler for Streaming Applications
, 2001
"... Streaming programs represent an increasingly important and widespread class of applications that holds unprecedented opportunities for high-impact compiler technology. Unlike sequential programs with obscured dependence information and complex communication patterns, a stream program is naturally wr ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Streaming programs represent an increasingly important and widespread class of applications that holds unprecedented opportunities for high-impact compiler technology. Unlike sequential programs with obscured dependence information and complex communication patterns, a stream program is naturally written as a set of concurrent filters with regular steady-state communication. The StreamIt language aims to provide a natural, high-level syntax that improves programmer productivity in the streaming domain. At the same time, the language imposes a hierarchical structure on the stream graph that enables novel representations and optimizations within the StreamIt compiler. We define the "stream dependence function", a fundamental relationship between the input channels of two filters in a stream graph. We also describe a suite of stream optimizations, a denotational semantics for validating these optimizations, and a novel phased scheduling algorithm for stream graphs. In addition, we have implemented a prototype of the StreamIt optimizing compiler that is showing promising results.
Constrained and phased scheduling of synchronous data flow graphs for StreamIT language
- MASTER’S THESIS, MIT
, 2002
"... ..."
Tradeoff between Data-, Instruction-, and Thread-level Parallelism in Stream Processors
- In Proceedings of the 21 st ACM International Conference on Supercomputing (ICS’07
, 2007
"... This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions. We develop detailed VLSI-cost and processorperformance models for a multi-threaded Stream Processor and evaluate the tradeoffs, in both functionality and h ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions. We develop detailed VLSI-cost and processorperformance models for a multi-threaded Stream Processor and evaluate the tradeoffs, in both functionality and hardware costs, of mechanisms that exploit the different types of parallelism. We show that the hardware overhead of supporting coarsegrained independent threads of control is 15 − 86 % depending on machine parameters. We also demonstrate that the performance gains provided are of a smaller magnitude for a set of numerical applications. We argue that for stream applications with scalable parallel algorithms the performance is not very sensitive to the control structures used within a large range of area-efficient architectural choices. We evaluate the specific effects on performance of scaling along the different parallelism dimensions and explain the limitations of the ILP, DLP, and TLP hardware mechanisms.
Architectural Support for the Stream Execution Model on General-Purpose Processors
"... There has recently been much interest in stream processing, both in industry (e.g., Cell, NVIDIA G80, ATI R580) and academia (e.g., Stanford Merrimac, MIT RAW), with stream programs becoming increasingly popular for both media and more general-purpose computing. Although a special style of programmi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
There has recently been much interest in stream processing, both in industry (e.g., Cell, NVIDIA G80, ATI R580) and academia (e.g., Stanford Merrimac, MIT RAW), with stream programs becoming increasingly popular for both media and more general-purpose computing. Although a special style of programming called stream programming is needed to target these stream architectures, huge performance benefits can be achieved. In this paper, we minimally add architectural features to commodity general-purpose processors (e.g., Intel/AMD) to efficiently support the stream execution model. We design the extensions to reuse existing components of the generalpurpose processor hardware as much as possible by investigating low-cost modifications to the CPU caches, hardware prefetcher, and the execution core. With a less than 1 % increase in die area along with judicious use of a software runtime system, we show that we can efficiently support stream programming on traditional processor cores. We evaluate our techniques by running scientific applications on a cyclelevel simulation system. The results show that our system executes stream programs as efficiently as possible, limited only by the ALU performance and the memory bandwidth needed to feed the ALUs. 1
A Common Machine Language for Grid-Based Architectures
"... A common machine language is an essential abstraction that allows programmers to express an algorithm in a way that can be efficiently executed on a variety of architectures. The key properties of a common machine language (CML) are: 1) it abstracts away the idiosyncratic differences between one arc ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A common machine language is an essential abstraction that allows programmers to express an algorithm in a way that can be efficiently executed on a variety of architectures. The key properties of a common machine language (CML) are: 1) it abstracts away the idiosyncratic differences between one architecture and another so that a programmer doesn't have to worry about them, and 2) it encapsulates the common properties of the architectures such that a compiler for any given target can still produce an eficient executable. For von-Neumann architectures, the canonical CML is C: instructions consist of basic arithmetic operations, executed sequentially, which operate on either local variables or values drawn from a global block of memory. C has been implemented efficiently on a wide range of architectures, and it saves the programmer from having to adapt to each kind of register layout, cache configuration, and instruction set. However, recent years have seen the emergence of a class...
SaarCOR - A Hardware Achitecture for Ray Tracing
, 2002
"... The ray tracing algorithm is well-known for its ability to generate high-quality images and its flexibility to support advanced rendering and lighting effects. Interactive ray tracing has been shown to work well on clusters of PCs and supercomputers but direct hardware support for ray tracing has be ..."
Abstract
- Add to MetaCart
The ray tracing algorithm is well-known for its ability to generate high-quality images and its flexibility to support advanced rendering and lighting effects. Interactive ray tracing has been shown to work well on clusters of PCs and supercomputers but direct hardware support for ray tracing has been difficult to implement.

