%0 Journal Article %J Journal of Computational Science %D 2016 %T Fine-grained Bit-Flip Protection for Relaxation Methods %A Hartwig Anzt %A Jack Dongarra %A Enrique S. Quintana-Orti %K Bit flips %K Fault tolerance %K High Performance Computing %K iterative solvers %K Jacobi method %K sparse linear systems %X Resilience is considered a challenging under-addressed issue that the high performance computing community (HPC) will have to face in order to produce reliable Exascale systems by the beginning of the next decade. As part of a push toward a resilient HPC ecosystem, in this paper we propose an error-resilient iterative solver for sparse linear systems based on stationary component-wise relaxation methods. Starting from a plain implementation of the Jacobi iteration, our approach introduces a low-cost component-wise technique that detects bit-flips, rejecting some component updates, and turning the initial synchronized solver into an asynchronous iteration. Our experimental study with sparse incomplete factorizations from a collection of real-world applications, and a practical GPU implementation, exposes the convergence delay incurred by the fault-tolerant implementation and its practical performance. %B Journal of Computational Science %8 2016-11 %G eng %R https://doi.org/10.1016/j.jocs.2016.11.013 %0 Journal Article %J Philosophical Transactions of the Royal Society A -- Mathematical, Physical and Engineering Sciences %D 2014 %T Improving the Energy Efficiency of Sparse Linear System Solvers on Multicore and Manycore Systems %A Hartwig Anzt %A Enrique S. Quintana-Orti %K energy efficiency %K graphics processing units %K High Performance Computing %K iterative solvers %K multicore processors %K sparse linear systems %X While most recent breakthroughs in scientific research rely on complex simulations carried out in large-scale supercomputers, the power draft and energy spent for this purpose is increasingly becoming a limiting factor to this trend. In this paper, we provide an overview of the current status in energy-efficient scientific computing by reviewing different technologies used to monitor power draft as well as power- and energy-saving mechanisms available in commodity hardware. For the particular domain of sparse linear algebra, we analyze the energy efficiency of a broad collection of hardware architectures and investigate how algorithmic and implementation modifications can improve the energy performance of sparse linear system solvers, without negatively impacting their performance. %B Philosophical Transactions of the Royal Society A -- Mathematical, Physical and Engineering Sciences %V 372 %8 2014-07 %G eng %N 2018 %R 10.1098/rsta.2013.0279