Results 11 
16 of
16
Computational Divided Differencing and DividedDifference Arithmetics
, 2000
"... Tools for computational differentiation transform a program that computes a numerical function F (x) into a related program that computes F 0 (x) (the derivative of F ). This paper describes how techniques similar to those used in computationaldifferentiation tools can be used to implement other pr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Tools for computational differentiation transform a program that computes a numerical function F (x) into a related program that computes F 0 (x) (the derivative of F ). This paper describes how techniques similar to those used in computationaldifferentiation tools can be used to implement other program transformations  in particular, a variety of transformations for computational divided differencing . The specific technical contributions of the paper are as follows: It presents a program transformation that, given a numerical function F (x) de ned by a program, creates a program that computes F [x0 ; x1 ], the first divided difference of F(x), where F [x0 ; x1 ] def = F (x 0 ) F (x 1 ) x 0 x 1 if x0 6= x1 d dz F (z); evaluated at z = x0 if x0 = x1 It shows how computational first divided differencing generalizes computational differentiation. It presents a second program transformation that permits the creation of higherorder divided differences of a numerical function de ...
A Study of Hardware Programming from a Compilation Perspective Ph.D. Research Proposal
, 2005
"... I propose a systematic review and evaluation of the use of general purpose, highlevel programming languages for the design and synthesis of circuit specifications that implement algorithms directly as specialized hardware configurations. Specifically, I propose an examination of the, socalled, sem ..."
Abstract
 Add to MetaCart
I propose a systematic review and evaluation of the use of general purpose, highlevel programming languages for the design and synthesis of circuit specifications that implement algorithms directly as specialized hardware configurations. Specifically, I propose an examination of the, socalled, semantic gap between the understood features and semantics of popular software programming languages such as C and C++, and the capabilities of programmable logic devices such as FPGAs. A significant amount of research effort has already been devoted to this topic, but it is my belief that this research has generally failed to adequately address certain key issues in both principle and implementation. My research will comprise a study of the theory and practice of programming hardware descriptions, with the aim of providing insights that suggest how to bridge the semantic gap and yield more effective hardware programming techniques. 1
Optimizing the Stack Size of Recursive Functions
"... For memory constrained environments, optimization for program size is often as important as, if not more important than, optimization for execution speed. Commonly, compilers try to reduce the code segment but neglect the stack segment, although the stack can significantly grow during the execution ..."
Abstract
 Add to MetaCart
For memory constrained environments, optimization for program size is often as important as, if not more important than, optimization for execution speed. Commonly, compilers try to reduce the code segment but neglect the stack segment, although the stack can significantly grow during the execution of recursive functions because a separate activation record is required for each recursive call. If a formal parameter or local variable is dead at all recursive calls, then it can be declared global so that only one instance exists independent of the recursion depth. We found that in 70 % of our benchmark functions, it is possible to reduce the stack size by declaring formal parameters and local variables global. Often, live ranges of formal parameters and local variables can be split at recursive calls through program transformations. These splitting transformations allowed us to further optimize the stack size of all our benchmark functions. If all formal parameters and local variables can be declared global, then such functions may be transformable into iterations. This was possible for all such benchmark functions.
Program Parallelization using Synchronized Pipelining
"... Abstract. While there are wellunderstood methods for detecting loops whose iterations are independent and parallelizing them, there are comparatively fewer proposals that support parallel execution of a sequence of loops or nested loops in the case where such loops have dependencies among them. Thi ..."
Abstract
 Add to MetaCart
Abstract. While there are wellunderstood methods for detecting loops whose iterations are independent and parallelizing them, there are comparatively fewer proposals that support parallel execution of a sequence of loops or nested loops in the case where such loops have dependencies among them. This paper introduces a refined notion of independence, called eventual independence, that in its simplest form considers two loops, say loop 1 and loop 2, and captures the idea that for every i there exists k such that the i + 1th iteration of loop 2 is independent from the jth iteration of loop 1, for all j ≥ k. Eventual independence provides the foundation of a semanticspreserving program transformation, called synchronized pipelining, that makes execution of consecutive or nested loops parallel, relying on a minimal number of synchronization events to ensure semantics preservation. The practical benefits of synchronized pipelining are demonstrated through experimental results on common algorithms such as sorting and Fourier transforms. 1
Challenges in Exploitation of Loop Parallelism in Embedded Applications
"... Embedded processors have been increasingly exploiting hardware parallelism. Vector units, multiple processors or cores, hyperthreading, specialpurpose accelerators such as DSPs or cryptographic engines, or a combination of the above have appeared in a number of processors. They serve to address th ..."
Abstract
 Add to MetaCart
Embedded processors have been increasingly exploiting hardware parallelism. Vector units, multiple processors or cores, hyperthreading, specialpurpose accelerators such as DSPs or cryptographic engines, or a combination of the above have appeared in a number of processors. They serve to address the increasing performance requirements of modern embedded applications. How this hardware parallelism can be exploited by applications is directly related to the amount of parallelism inherent in a target application. In this paper we evaluate the performance potential of different types of parallelism, viz., true threadlevel parallelism, speculative threadlevel parallelism and vector parallelism, when executing loops. Applications from the industrystandard EEMBC 1.1, EEMBC 2.0 and the MiBench embedded benchmark suites are analyzed using the Intel C compiler. The results show what can be achieved today, provide upper bounds on the performance potential of different types of thread parallelism, and point out a number of issues that need to be addressed to improve performance. The latter include parallelization of libraries such as libc and design of parallel algorithms to allow maximal exploitation of parallelism. The results also point to the need for developing new benchmark suites more suitable to parallel compilation and execution.