Results 1 - 10
of
11
A Stream Compiler for Communication-Exposed Architectures
- In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2002
"... With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, iWa ..."
Abstract
-
Cited by 61 (16 self)
- Add to MetaCart
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, iWarp, SmartMemories). However, for their use to be widespread, it will be necessary to develop compiler technology that enables a portable, high-level language to execute efficiently across a range of wireexposed architectures.
Linear Analysis and Optimization of Stream Programs
- In PLDI
, 2003
"... As more complex DSP algorithms are realized in practice, there is an increasing need for high-level stream abstractions that can be compiled without sacrificing efficiency. Toward this end, we present a set of aggressive optimizations that target linear sections of a stream program. Our input langua ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
As more complex DSP algorithms are realized in practice, there is an increasing need for high-level stream abstractions that can be compiled without sacrificing efficiency. Toward this end, we present a set of aggressive optimizations that target linear sections of a stream program. Our input language is StreamIt, which represents programs as a hierarchical graph of autonomous filters. A filter is linear if each of its outputs can be represented as an affine combination of its inputs. Linearity is common in DSP components; examples include FIR. filters, expanders, compressors, FFTs and DCTs.
Runtime Support for Multicore Haskell
"... Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describ ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Purely functional programs should run well on parallel hardware because of the absence of side effects, but it has proved hard to realise this potential in practice. Plenty of papers describe promising ideas, but vastly fewer describe real implementations with good wall-clock performance. We describe just such an implementation, and quantitatively explore some of the complex design tradeoffs that make such implementations hard to build. Our measurements are necessarily detailed and specific, but they are reproducible, and we believe that they offer some general insights. 1.
Teleport messaging for distributed stream programs, Symposium on Principles and Practice of Parallel Programming (PPoPP
, 2005
"... In this paper, we develop a new language construct to address one of the pitfalls of parallel programming: precise handling of events across parallel components. The construct, termed teleport messaging, uses data dependences between components to provide a common notion of time in a parallel system ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
In this paper, we develop a new language construct to address one of the pitfalls of parallel programming: precise handling of events across parallel components. The construct, termed teleport messaging, uses data dependences between components to provide a common notion of time in a parallel system. Our work is done in the context of the Synchronous Dataflow (SDF) model, in which computation is expressed as a graph of independent components (or actors) that communicate in regular patterns over data channels. We leverage the static properties of SDF to compute a stream dependence function, sdep, that compactly describes the ordering constraints between actor executions. Teleport messaging utilizes sdep to provide powerful and precise event handling. For example, an actor A can specify that an event should be processed by a downstream actor B as soon as B sees the “effects ” of the current execution of A. We argue that teleport messaging improves readability and robustness over existing practices. We have implemented messaging as part of the StreamIt compiler, with a backend for a cluster of workstations. As teleport messaging exposes optimization opportunities to the compiler, it also results in a 49 % performance improvement for a software radio benchmark.
Implicitlythreaded parallelism in Manticore
- In ICFP ’08
, 2008
"... The increasing availability of commodity multicore processors is making parallel computing available to the masses. Traditional parallel languages are largely intended for large-scale scientific computing and tend not to be well-suited to programming the applications one typically finds on a desktop ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
The increasing availability of commodity multicore processors is making parallel computing available to the masses. Traditional parallel languages are largely intended for large-scale scientific computing and tend not to be well-suited to programming the applications one typically finds on a desktop system. Thus we need new parallel-language designs that address a broader spectrum of applications. In this paper, we present Manticore, a language for building parallel applications on commodity multicore hardware including a diverse collection of parallel constructs for different granularities of work. We focus on the implicitly-threaded parallel constructs in our high-level functional language. We concentrate on those elements that distinguish our design from related ones, namely, a novel parallel binding form, a nondeterministic parallel case form, and exceptions in the presence of data parallelism. These features differentiate the present work from related work on functional data parallel language designs, which has focused largely on parallel problems with regular structure and the compiler transformations — most notably, flattening — that make such designs feasible. We describe our implementation strategies and present some detailed examples utilizing various mechanisms of our language.
Optimizing stream programs using linear state space analysis
- In CASES
, 2005
"... Digital Signal Processing (DSP) is becoming increasingly widespread in portable devices. Due to harsh constraints on power, latency, and throughput in embedded environments, developers often appeal to signal processing experts to handoptimize algorithmic aspects of the application. However, such DSP ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Digital Signal Processing (DSP) is becoming increasingly widespread in portable devices. Due to harsh constraints on power, latency, and throughput in embedded environments, developers often appeal to signal processing experts to handoptimize algorithmic aspects of the application. However, such DSP optimizations are tedious, error-prone, and expensive, as they require sophisticated domain-specific knowledge. We present a general model for automatically representing and optimizing a large class of signal processing applications. The model is based on linear state space systems. A program is viewed as a set of filters, each of which has an input stream, an output stream, and a set of internal states. At each time step, the filter produces some outputs that are a linear combination of the inputs and the state values; the state values are also updated in a linear fashion. Examples of linear state space filters include IIR filters and linear difference equations. Using the state space representation, we describe a novel set of program transformations, including combination of adjacent filters, elimination of redundant states and reduction of the number of system parameters. We have implemented the optimizations in the StreamIt compiler and demonstrate improved generality over previous techniques. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors—Optimization; compilers; code generation; D.2.2 [Software Engineering]: Software Architectures—Domain-specific architectures;
MPEG-2 Decoding in a Stream Programming Language
- In IPDPS, Rhodes Island
, 2006
"... Image and video codecs are prevalent in multimedia devices, ranging from embedded systems, to desktop computers, to high-end servers such as HDTV editing consoles. It is not uncommon however that developers create and customize separate coder and decoder implementations for each of the architectures ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Image and video codecs are prevalent in multimedia devices, ranging from embedded systems, to desktop computers, to high-end servers such as HDTV editing consoles. It is not uncommon however that developers create and customize separate coder and decoder implementations for each of the architectures they target. This practice is time consuming and error prone, leading to code that is neither malleable nor portable. This paper describes an implementation of the MPEG-2 decoder using the StreamIt programming language. StreamIt is an architecture-independent stream language that aims to improve programmer productivity, while concomitantly exposing the inherent parallelism and communication topology of the application. The paper shows that MPEG is a good match for the streaming programming model and illustrates the malleability of the implementation using a simple modification to the decoder to support alternate color compression formats. StreamIt allows for modular application development, which increases code reuse, and reduces the complexity of the debugging process since stream components can be verified independently. This in turn leads to greater programmer productivity. 1.
Constrained and phased scheduling of synchronous data flow graphs for StreamIT language
- MASTER’S THESIS, MIT
, 2002
"... ..."
Clustered Workflow Execution of Retargeted Data Analysis Scripts
- EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID
"... Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more data produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to re ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more data produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to rely on server-side subsetting and limiting the scope of data they can analyze on a workstation. Our system supplements existing scientific data services with lightweight computational capability, providing a means of safely relocating analysis from the desktop to the server where clustered execution can be coordinated, exploiting data locality, reducing unnecessary data transfer, and providing end-users with results several times faster. We show how dataflow and other compiler-inspired analyses of shell scripts of scientists’ most common analysis tools enables parallelization and optimizations in disk and network I/O bandwidth. We benchmark using an actual geoscience analysis script, illustrating the crucial performance gains of extracting workflows defined in scripts and optimizing their execution. Current results quantify significant improvements in performance, showing the promise of bringing transparent high-performance analysis to the scientist’s desktop.
Programming with Exceptions in JCilk 1
"... JCilk extends the serial subset of the Java language by importing the fork-join primitives spawn and sync from the Cilk multithreaded language, thereby providing call-return semantics for multithreaded subcomputations. In addition, JCilk transparently integrates Java’s exception handling with multit ..."
Abstract
- Add to MetaCart
JCilk extends the serial subset of the Java language by importing the fork-join primitives spawn and sync from the Cilk multithreaded language, thereby providing call-return semantics for multithreaded subcomputations. In addition, JCilk transparently integrates Java’s exception handling with multithreading by extending the semantics of Java’s try

