### Control Flow Emulation on Tiled SIMD Architectures

Abstract. Heterogeneous multi-core and streaming architectures such as the GPU, Cell, ClearSpeed, and Imagine processors have better power/ performance ratios and memory bandwidth than traditional architectures. These types of processors are increasingly being used to accelerate compute-intensive applications. Their performance advantage is achieved by using multiple SIMD processor cores but limiting the complexity of each core, and by combining this with a simplified memory system. In particular, these processors generally avoid the use of cache coherency protocols and may even omit general-purpose caches, opting for restricted caches or explictly managed local memory. We show how control flow can be emulated on such tiled SIMD architectures and how memory access can be organized to avoid the need for a general-purpose cache and to tolerate long memory latencies. Our technique uses streaming execution and multipass partitioning. Our prototype targets GPUs. On GPUs the memory system is deeply pipelined and caches for read and write are not coherent, so reads and writes may not use the same memory locations simultaneously. This requires the use of double-buffered streaming. We emulate general control flow in a way that is transparent to the programmer and include specific optimizations in our approach that can deal with double-buffering. 1

### The Advent of Recursion . . .

The term ‘recursive’ has had different meanings during the past two centuries among various communities of scholars. Its historical epistemology has already been described by Soare (1996) with respect to the mathematicians, logicians, and recursive-function theorists. The computer practitioners, on the other hand, are discussed in this paper by focusing on the definition and implementation of the ALGOL60 programming language. Recursion entered ALGOL60 in two novel ways: (i) syntactically with what we now call BNF notation, and (ii) dynamically by means of the recursive procedure. As is shown, both (i) and (ii) were introduced by linguistically-inclined programmers who were not versed in logic and who, rather unconventionally, abstracted away from the down-to-earth practicalities of their computing machines. By the end of the 1960s, some computer practitioners had become aware of the theoretical insignificance of the recursive procedure in terms of computability, though without relying on recursive-function theory. The presented results help us to better understand the technological ancestry of modernday computer science, in the hope that contemporary researchers can more easily build upon its past.

### Universality and Semicomputability for Nondeterministic Programming Languages over Abstract Algebras

, 2006

The Universal Function Theorem (UFT) originated in 1930s with the work of Alan Turing, who proved the existence of a universal Turing machine for computations on strings over a finite alphabet. This stimulated the development of stored-program computers. Classical computability theory, including the UFT and the theory of semicomputable sets, has been extended by Tucker and Zucker to abstract manysorted algebras, with algorithms formalized as deterministic While programs. This paper investigates the extension of this work to the nondeterministic programming languages While RA consisting of While programs extended by random assignments, as well as sublanguages of While RA formed by restricting the random assignments to booleans or naturals only. It also investigates the nondeterministic language GC of guarded commands. There are two topics algebras in these languages; (2) concepts of semicomputability for these languages, and the extent to which they coincide with semicomputability for the deterministic While language. data types, abstract computability, random assignments, guarded commands, nondeterminism.

### Effectiveness ∗

, 2011

We describe axiomatizations of several aspects of effectiveness: effectiveness of transitions; effectiveness relative to oracles; and absolute effectiveness, as posited by the Church-Turing Thesis. Efficiency is doing things right; effectiveness is doing the right things. —Peter F. Drucker

### Causal commutative arrows

Arrows are a popular form of abstract computation. Being more general than monads, they are more broadly applicable, and, in particular, are a good abstraction for signal processing and dataflow computations. Most notably, arrows form the basis for a domain-specific language called Yampa, which has been used in a variety of concrete applications, including animation, robotics, sound synthesis, control systems, and graphical user interfaces. Our primary interest is in better understanding the class of abstract computations captured by Yampa. Unfortunately, arrows are not concrete enough to do this with precision. To remedy this situation, we introduce the concept of commutative arrows that capture a noninterference property of concurrent computations. We also add an init operator that captures the causal nature of arrow effects, and identify its associated law. To study this class of computations in more detail, we define an extension to arrows called causal commutative arrows (CCA), and study its properties. Our key contribution is the identification of a normal form for CCA called causal commutative normal form (CCNF). By defining a normalization procedure, we have developed an optimization strategy that yields dramatic improvements in performance over conventional implementations of arrows. We have implemented this technique in Haskell, and conducted benchmarks that validate the effectiveness of our approach. When compiled with the Glasgow Haskell Compiler (GHC), the overall methodology can result in significant speedups. 1

### PREPRINT Authors ’ information

Why would you want to read this chapter? This chapter is about how to better understand the dynamics of computer models using both simulation and mathematical analysis. Our starting point is a computer model which is already implemented and ready to be run; our objective is to gain a thorough understanding of its dynamics. This chapter shows how computer simulation and mathematical analysis can be used together to provide a picture of the dynamics of the This chapter shows how computer simulation and mathematical analysis can be used together to understand the dynamics of computer models. For this purpose, we show that it is useful to see the computer model as a particular implementation of a formal model in a certain programming language. This formal model is the abstract entity which is defined by the input-output relation that the computer model executes, and can be seen as a function that transforms probability distributions over the set of possible inputs into probability distributions over the set of possible outputs. It is shown here that both computer simulation and mathematical analysis are extremely

, 2013

We apply Andy Pitts’s methods of defining relations over domains to several classical results in the literature. We show that the Y combinator coincides with the domaintheoretic fixpoint operator, that parallel-or and the Plotkin existential are not definable in PCF, that the continuation semantics for PCF coincides with the direct semantics, and that our domain-theoretic semantics for PCF is adequate for reasoning about contextual equivalence in an operational semantics. Our version of PCF is untyped and has both strict and non-strict function abstractions. The development is carried out in HOLCF.