Results 1 - 10
of
17
AccMon: Automatically Detecting Memory-related Bugs via Program Counter-based Invariants
- In 37th International Symposium on Microarchitecture (MICRO
, 2004
"... This paper makes two contributions to architectural support for software debugging. First, it proposes a novel statistics-based, onthe -fly bug detection method called PC-based invariant detection. The idea is based on the observation that, in most programs, a given memory location is typically acce ..."
Abstract
-
Cited by 47 (10 self)
- Add to MetaCart
This paper makes two contributions to architectural support for software debugging. First, it proposes a novel statistics-based, onthe -fly bug detection method called PC-based invariant detection. The idea is based on the observation that, in most programs, a given memory location is typically accessed by only a few instructions. Therefore, by capturing the invariant of the set of PCs that normally access a given variable, we can detect accesses by outlier instructions, which are often caused by memory corruption, buffer overflow, stack smashing or other memory-related bugs. Since this method is statistics-based, it can detect bugs that do not violate any programming rules and that, therefore, are likely to be missed by many existing tools. The second contribution is a novel architectural extension called the Check Look-aside Buffer (CLB). The CLB uses a Bloom filter to reduce monitoring overheads in the recentlyproposed iWatcher architectural framework for software debugging. The CLB significantly reduces the overhead of PC-based invariant debugging.
KeY-C: A tool for verification of C programs
- In Proceedings of 21st Conference on Automated Deduction (CADE-21
, 2007
"... Abstract. We present KeY-C, a tool for deductive verification of C programs. KeY-C allows to prove partial correctness of C programs relative to pre- and postconditions. It is based on a version of KeY that supports Java Card. In this paper we give a glimpse of syntax, semantics, and calculus of C D ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Abstract. We present KeY-C, a tool for deductive verification of C programs. KeY-C allows to prove partial correctness of C programs relative to pre- and postconditions. It is based on a version of KeY that supports Java Card. In this paper we give a glimpse of syntax, semantics, and calculus of C Dynamic Logic (CDL) that were adapted from their Java Card counterparts, based on an example. Currently, the tool is in an early development stage. 1
Optimizing Irregular Shared-Memory Applications for DistributedMemory Systems
- Proceedings of the Symposium on Principles and Practice of Parallel Programming
, 2006
"... In previous work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributedmemory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this translation scheme is limited by the fact that acce ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In previous work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributedmemory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this translation scheme is limited by the fact that accesses to shared-data cannot be accurately resolved at compiletime. Additionally, irregular applications with high communication to computation ratios pose challenges even for direct implementation on message passing systems. In this paper, we present combined compile-time/run-time techniques for optimizing irregular shared-memory applications on message passing systems in the context of automatic translation from OpenMP to MPI. The goal of our transformation is to enable computation-communication overlap by restructuring irregular parallel loops. In our approach, the compiler creates inspectors to analyze actual data access patterns for irregular accesses at runtime. This analysis is combined with the compile-time analysis of regular data accesses to determine which iterations of irregular loops access non-local data. The iterations are then reordered to enable computation communication overlap. In the case where the irregular access occurs inside a nested loop, the nested loop is restructured. We evaluate our techniques by translating OpenMP versions of benchmarks from two important classes of irregular applications- sparse matrix computations and molecular dynamics. We compare the performance obtained by the MPI versions created using techniques proposed in this paper with the performance of MPI versions created using the baseline OpenMP to MPI translation proposed in previous work, as well as with the performance of ∗ This material is based upon work supported in part by the National Science Foundation under Grants No. 9974976-EIA, 0103582-EIA, and 0429535-CCF. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of
Experiences in using cetus for source-to-source transformations
- In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing (LCPC
, 2004
"... Abstract. Cetus is a compiler infrastructure for the source-to-source transformation of programs. Since its creation nearly three years ago, it has grown to over 12,000 lines of Java code, been made available publically on the web, and become a basis for several research projects. We discuss our exp ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. Cetus is a compiler infrastructure for the source-to-source transformation of programs. Since its creation nearly three years ago, it has grown to over 12,000 lines of Java code, been made available publically on the web, and become a basis for several research projects. We discuss our experience using Cetus for a selection of these research projects. The focus of this paper is not the projects themselves, but rather how Cetus made these projects possible, how the needs of these projects influenced the development of Cetus, and the solutions we applied to problems we encountered with the infrastructure. We believe the research community can benefit from such a discussion, as shown by the strong interest in the mini-workshop on compiler research infrastructures where some of this information was first presented. 1
Jaql: A scripting language for large scale semistructured data analysis. VLDB
, 2011
"... This paper describes Jaql, a declarative scripting language for analyzing large semistructured datasets in parallel using Hadoop’s MapReduce framework. Jaql is currently used in IBM’s InfoSphere BigInsights [5] and Cognos Consumer Insight [9] products. Jaql’s design features are: (1) a flexible data ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper describes Jaql, a declarative scripting language for analyzing large semistructured datasets in parallel using Hadoop’s MapReduce framework. Jaql is currently used in IBM’s InfoSphere BigInsights [5] and Cognos Consumer Insight [9] products. Jaql’s design features are: (1) a flexible data model, (2) reusability, (3) varying levels of abstraction, and (4) scalability. Jaql’s data model is inspired by JSON and can be used to represent datasets that vary from flat, relational tables to collections of semistructured documents. A Jaql script can start without any schema and evolve over time from a partial to a rigid schema. Reusability is provided through the use of higher-order functions and by packaging related functions into modules. Most Jaql scripts work at a high level of abstraction for concise specification of logical operations (e.g., join), but Jaql’s notion of physical transparency also provides a lower level of abstraction if necessary. This allows users to pin down the evaluation plan of a script for greater control or even add new operators. The Jaql compiler automatically rewrites Jaql scripts so they can run in parallel on Hadoop. In addition to describing Jaql’s design, we present the results of scale-up experiments on Hadoop running Jaql scripts for intranet data analysis and log processing. 1.
Experiences Developing the OpenUH Compiler and Runtime Infrastructure
"... Abstract—The OpenUH compiler is a branch of the open source Open64 compiler suite for C, C++, Fortran 95/2003, with support for a variety of targets including x86 64, IA-64, and IA-32. For the past several years, we have used OpenUH to conduct research in parallel programming models and their implem ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—The OpenUH compiler is a branch of the open source Open64 compiler suite for C, C++, Fortran 95/2003, with support for a variety of targets including x86 64, IA-64, and IA-32. For the past several years, we have used OpenUH to conduct research in parallel programming models and their implementation, static and dynamic analysis of parallel applications, and compiler integration with external tools. In this paper, we describe the evolution of the OpenUH infrastructure and how we’ve used it to carry out our research and teaching efforts. I.
FCUDA: Enabling Efficient Compilation of CUDA Kernels onto FPGAs
"... Abstract — As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore’s law, the computing industry has switched its route to higher performance through parallel processing. The rise of multi-core systems in all domains of computing has ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract — As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore’s law, the computing industry has switched its route to higher performance through parallel processing. The rise of multi-core systems in all domains of computing has opened the door to heterogeneous multi-processors, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is often not a push-button task. Often the programmer has to expose the application’s fine and coarse grained parallelism by using special APIs. CUDA is such a parallel-computing API that is driven by the GPGPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool which enables high-abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for AutoPilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs. I.
CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures
"... Abstract—The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a “write once, run anywhere ” ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to-OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of the automatically translated applications via CU2CL is on par with their manually ported counterparts. I.
A Dynamic Logic for Deductive Verification of C Programs with KeY-C
"... Abstract. We present KeY-C: a tool for deductive verification of C programs. KeY-C allows verification of C programs w.r.t. operation contracts and invariants. It is based on an earlier version of KeY that supports Java Card. In this paper we outline syntax, semantics, and calculus of C Dynamic Logi ..."
Abstract
- Add to MetaCart
Abstract. We present KeY-C: a tool for deductive verification of C programs. KeY-C allows verification of C programs w.r.t. operation contracts and invariants. It is based on an earlier version of KeY that supports Java Card. In this paper we outline syntax, semantics, and calculus of C Dynamic Logic (CDL) that were adapted from their Java Card counterparts. Currently, the tool is in an early development stage. As a side-product of this work we expect to generalize KeY architecture for easily adding the support for new programming languages. This paper is a further development of our work described in [11]. 1
USING PROGRAMMABLE LOGIC BY
"... This thesis presents a processor and memory-hierarchy prototype based on FPGAs that provides hardware support for program rollback. We use this prototype to demonstrate how compiler- or user-controlled speculative execution can help in debugging production codes. The system is based on a synthesizab ..."
Abstract
- Add to MetaCart
This thesis presents a processor and memory-hierarchy prototype based on FPGAs that provides hardware support for program rollback. We use this prototype to demonstrate how compiler- or user-controlled speculative execution can help in debugging production codes. The system is based on a synthesizable VHDL implementation of a 32-bit processor compliant with the SPARC V8 architecture. We conduct experiments on applications with real bugs. The applications run on top of a version of Linux ported to this hardware. Our experiments show that our system is able to successfully execute the buggy code sections speculatively. This allows the thorough characterization of the faulty code through repeated rollback and re-execution. Moreover, the hardware extensions we made to the baseline system increase the hardware resource requirements by less than 4.5%. iii Acknowledgments I would like to thank my adviser, Josep Torrellas, for his technical, moral and financial support. I would also like to thank the IACOMA and PROBE research groups at University of Illinois for valuable discussions and feedback on our research. iv

