Results 1 
2 of
2
Lightweight silent data corruption detection based on runtime data analysis for HPC applications
 In Proc. HPDC
, 2015
"... Nextgeneration supercomputers are expected to have more components and, at the same time, consume several times less energy per operation. Consequently, the number of soft errors is expected to increase dramatically in the coming years. In this respect, techniques that leverage certain properties o ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Nextgeneration supercomputers are expected to have more components and, at the same time, consume several times less energy per operation. Consequently, the number of soft errors is expected to increase dramatically in the coming years. In this respect, techniques that leverage certain properties of iterative HPC applications (such as the smoothness of the evolution of a particular dataset) can be used to detect silent errors at the application level. In this paper, we present a pointwise detection model with two phases: one involving the prediction of the next expected value in the time series for each data point, and another determining a range (i.e., normal value interval) surrounding the predicted nextstep value. We show that dataset correlation can be used to detect corruptions indirectly and limit the size of the data set to monitor, taking advantage of the underlying physics of the simulation. Our results show that, using our techniques, we can detect a large number of corruptions (i.e., above 90 % in some cases) with 84 % memory overhead, and 13.75 % extra computation time.
ACOFFEE: an Optimizing Compiler for Finite Element Local Assembly
"... The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problemspecific kernel to numerically evaluate an integral for each ..."
Abstract
 Add to MetaCart
(Show Context)
The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problemspecific kernel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be huge, executing efficient kernels is fundamental. Their optimization is, however, a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions make it hard to determine a single or unique sequence of successful transformations. Therefore, we present the design and systematic evaluation of COFFEE, a domainspecific compiler for local assembly kernels. COFFEE manipulates abstract syntax trees generated from a highlevel domainspecific language for PDEs by introducing domainaware composable optimizations aimed at improving instructionlevel parallelism, especially SIMD vectorization, and register locality. It then generates C code including vector intrinsics. Experiments using a range of finiteelement