A cool way of improving the reliability of HPC machines (2013)

by Osman Sarood, Esteban Meneses, L V Kale
Venue:In Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE/ACM SC’13