## Fujitsu Labs of America

### BibTeX

@MISC{Li_fujitsulabs,

author = {Guodong Li and Ganesh Gopalakrishnan},

title = {Fujitsu Labs of America},

year = {}

}

### OpenURL

### Abstract

Abstract—We present an automated symbolic verifier for checking the functional correctness of GPGPU kernels parametrically, for an arbitrary number of threads. Our tool PUGpara checks the functional equivalence of a kernel and its optimized versions, helping debug errors introduced during memory coalescing and bank conflict elimination related optimizations. Key features of our work include: (1) a symbolic method to encode a comparative assertion across two kernel versions, and (2) techniques to overcome SMT solver restrictions through overapproximations, yielding an efficient bug-hunting method.

### Citations

472 | The Omega test: A fast and practical integer programming algorithm for dependence analysis
- Pugh
- 1991
(Show Context)
Citation Context ...omputational rounds. Over these rounds, it symbolically reasons about the possible values of shared variables contributed by all threads. From one perspective, it implicitly implements the Omega Test =-=[20]-=- using SMT techniques. When this checking approach applies to a kernel, it is sound (no false alarms will be reported). We also propose an over-approximation approach to combat the capacity limits of ... |

109 |
Programming Massively Parallel Processors: A Hands-on Approach, Applications of GPU Computing Series
- Kirk, Hwu
- 2010
(Show Context)
Citation Context ...c Analysis; Correctness of Optimizations. I. INTRODUCTION There is an explosive growth of interest in Graphical Processing Units (GPU) for speeding up computations occurring at all application scales =-=[11]-=-. When properly programmed, GPUs can yield 20x to 100x performance compared to traditional CPUs. Often this requires heroic acts of programming: (i) keep the GPU threads busy; (ii) ensure coalesced da... |

80 | Automatic deductive verification with invisible invariants
- Pnueli, Ruah, et al.
- 2001
(Show Context)
Citation Context ...ing method for the number of processes satisfying a given predicate. These techniques require manual effort to obtain the abstractions. They also do not directly apply to GPUs. There are efforts [1], =-=[18]-=- that apply automatic induction to generate and verify invariants pertaining to parameterized systems. In most cases, manual effort is required to obtain the invariants, although Pnueli et al. [18] pr... |

71 | Parameterized verification with automatically computed inductive assertions
- Arons, Pnueli, et al.
- 2001
(Show Context)
Citation Context ...counting method for the number of processes satisfying a given predicate. These techniques require manual effort to obtain the abstractions. They also do not directly apply to GPUs. There are efforts =-=[1]-=-, [18] that apply automatic induction to generate and verify invariants pertaining to parameterized systems. In most cases, manual effort is required to obtain the invariants, although Pnueli et al. [... |

33 | TVOC: A Translation Validator for Optimizing Compilers
- Barrett, Fang, et al.
- 2005
(Show Context)
Citation Context ... only linear arithmetic. Since these works do not exploit decision procedures, they are unable to handle many arithmetic transformations. They can only deal with programs with high similarities. TVOC =-=[2]-=- first verifies loop transformations using a specific proof rule called Permute, and then verifies structurepreserving optimizations. It relies on extra information supplied by the compiler to generat... |

27 | Automated Dynamic Analysis of CUDA Programs
- Boyer, Skadron, et al.
- 2008
(Show Context)
Citation Context ...ply to emerging standards (e.g., OpenCL [16]). There are only few GPU-specific checkers reported in the past. Table I gives a comparison of these tools. An instrumentation based technique is reported =-=[4]-=- to find races and shared memory bank conflicts. This is a dynamicComparison Categories PUGpara (extend from [13]) GKLEE [14] [4] (GRace [27]) Methodology Symbolic Analysis Concolic Exec. in virtual ... |

15 | Scalable SMT-based verification of GPU kernel functions
- Li, Gopalakrishnan
- 2010
(Show Context)
Citation Context ...ight Single Instruction Multiple Data (SIMD) threads that synchronize sparingly using barriers. These little resemble threads of C/Java that are heavy-weight, and synchronize using locks/monitors. In =-=[13]-=-, we introduce an SMT [22] based approach for analyzing GPU kernels through a new tool PUG that can handle kernels of thousands of lines of code – but for a fixed number (e.g., two or three) threads. ... |

13 | GKLEE: Concolic verification and test generation for GPUs - Li, Li, et al. - 2012 |

11 | Covac: Compiler validation by program analysis of the cross-product
- Zaks, Pnueli
- 2008
(Show Context)
Citation Context ...ucturepreserving optimizations. It relies on extra information supplied by the compiler to generate verification conditions which are fed to an SMT solver for satisfiability checking. Zaks and Pnueli =-=[26]-=- also used SMT solving to verify structure-preserving optimizations. Their verifier attempts to find invariants connecting the models of the two programs. However, it is difficult to identify sufficie... |

11 | Trusted source translation of a total function language
- Li, Slind
- 2008
(Show Context)
Citation Context ...tations of our approach by introducing a richer set of inference rules (e.g. for complicated loop optimizations). Typical transformation rules can be verified once and for all, or over each execution =-=[15]-=-, [12]. Symmetry Reduction. In many cases, loop bounds in a CUDA kernel depends on the size of a block. As many CUDA kernels are designed to run on arbitrarily-sized blocks [8], one can expect to be a... |

10 | Relational verification using product programs
- Barthe, Crespo, et al.
- 2011
(Show Context)
Citation Context ... at i to x. Note that a write to an array variable is actually modeled as an array update. We also give below a simple example of applying Γ. Γ(e1 op e2) Γ(v := e) Γ(v[e1] := e2) Γ(v) int k = 0; int a=-=[3]-=-; int i = a[1] + k; a[0] = i * k; i++; . = Γ(e1) op Γ(e2) . = vnext(v) = Γ(e) . = vnext(v) = vcur(v)([Γ(e1)] ↦→ Γ(e2)) . = vcur(v) Γ → k1 = 0 i1 = a0[1]+k1 a1 = a0([0] ↦→ i1 ∗k0) i2 = i1 +1 Branches. ... |

10 |
Equivalence Checking of Static Affine Programs using Widening to Handle Recurrences
- VERDOOLAEGE, JANSSENS, et al.
- 2008
(Show Context)
Citation Context ...ead and does not require symmetry reduction. Equivalence Checking. Many approaches have been proposed for checking the equivalence of two sequential programs. For instance, equivalence checkers [21], =-=[24]-=- perform a dependence graph abstraction of programs containing only affine loops. The basic idea is to use the Omega test to check whether the relations depicting the dependence graphs are equal. Unfo... |

9 | GRace: a low-overhead mechanism for detecting data races in GPU programs
- Zheng, Ravi, et al.
- 2011
(Show Context)
Citation Context ...hese tools. An instrumentation based technique is reported [4] to find races and shared memory bank conflicts. This is a dynamicComparison Categories PUGpara (extend from [13]) GKLEE [14] [4] (GRace =-=[27]-=-) Methodology Symbolic Analysis Concolic Exec. in virtual machine Dyn. Check (+ Static Analysis) Level of Analysis Source Code LLVM Bytecode Source Code Instrument. Bugs Targeted Race, Func. Corrct., ... |

6 | Real-time system verification by k-induction
- Pike
- 2005
(Show Context)
Citation Context ...ntified formula since the definition of TRANS is recursive over the number of threads and the solver requires a concrete n to unroll the recursion. This also forbids using induction (e.g. k-induction =-=[17]-=-) to to perform the proof. Moreover, the fact that our translation conjoins the models ofnthreads will make SMT solving quite complex (and of course it would not lead to a parametric approach). The As... |

5 | Validated compilation through logic
- Li
- 2011
(Show Context)
Citation Context ...s of our approach by introducing a richer set of inference rules (e.g. for complicated loop optimizations). Typical transformation rules can be verified once and for all, or over each execution [15], =-=[12]-=-. Symmetry Reduction. In many cases, loop bounds in a CUDA kernel depends on the size of a block. As many CUDA kernels are designed to run on arbitrarily-sized blocks [8], one can expect to be able to... |

5 |
Maurice Bruynooghe, Francky Catthoor, and Gerda Janssens. An automatic verification technique for loop and data reuse transformations based on geometric modeling of programs
- Shashidhar
(Show Context)
Citation Context ...ed thread and does not require symmetry reduction. Equivalence Checking. Many approaches have been proposed for checking the equivalence of two sequential programs. For instance, equivalence checkers =-=[21]-=-, [24] perform a dependence graph abstraction of programs containing only affine loops. The basic idea is to use the Omega test to check whether the relations depicting the dependence graphs are equal... |

4 |
programming guide version 1.1
- Cuda
(Show Context)
Citation Context ...or over each execution [15], [12]. Symmetry Reduction. In many cases, loop bounds in a CUDA kernel depends on the size of a block. As many CUDA kernels are designed to run on arbitrarily-sized blocks =-=[8]-=-, one can expect to be able to reduce block sizes to a reasonable value before running PUGpara. Currently, such downscaling is done manually. We plan to develop an automatic symmetry reduction approac... |

2 |
Muralidhar Talupur, and Helmut Veith, Proving Ptolemy right: The environment abstraction framework for model checking concurrent systems
- Clarke
- 2008
(Show Context)
Citation Context ...s floating-point reasoning methods can be incorporated into PUGpara which currently lacks the ability to handle float numbers. Parameterized Verification. There are abstraction based techniques [19], =-=[5]-=- that help reduce the problem of verifying parameterized systems with infinite states to that of checking corresponding finite-state abstractions. The abstraction methods employed include counter abst... |

1 |
Symbolic testing of OpenCL code, Haifa Verification Conference (HVC
- Collingbourne, Cadar, et al.
- 2011
(Show Context)
Citation Context ..., the techniques used in PUG can easily accommodate the use of symbolic thread identifiers. However, these straightforward extensions do now work for functional equivalence checking. The KLEE-FP tool =-=[6]-=- handles OpenCL code. Its main use is in crosschecking OpenCL code against an initial scalar sequential version, and also for race detection in such code. Its approach to floating-point equivalence is... |

1 |
Type-based race detection for Java, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000. Youssef Hanna, Samik Basu, and Hridesh Rajan, Behavioral automata composition for automatic topology independent verification of par
- Flanagan, Freund
- 2009
(Show Context)
Citation Context ...nd examine only those concurrency schedules produced by the execution environment. Past efforts in thread verification have focused on multi-threaded programs synchronizing using locks and semaphores =-=[9]-=-. These methods are inapplicable for GPU kernels. Our work is tailored for CUDA which is very widely used; it will easily apply to emerging standards (e.g., OpenCL [16]). There are only few GPU-specif... |

1 |
Checking non-interference
- Tripakis, Stergiou, et al.
- 2010
(Show Context)
Citation Context ...lt to identify sufficiently precise invariants for non-trivial optimizations. Also, these checkers can handle only sequential programs. An equivalence checking method for CUDA kernels is discussed in =-=[23]-=-. It makes many assumptions and restrictions on the input programs and is not parameterized. No implementation of this method is reported.III. SMT ENCODING AND NON-PARAMETERIZED CHECKING Although CUD... |