%0 Conference Proceedings %B Software for Exascale Computing - SPPEXA %D 2016 %T Domain Overlap for Iterative Sparse Triangular Solves on GPUs %A Hartwig Anzt %A Edmond Chow %A Daniel Szyld %A Jack Dongarra %E Hans-Joachim Bungartz %E Philipp Neumann %E Wolfgang E. Nagel %X Iterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the block-asynchronous Jacobi method with directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and time-to-solution. %B Software for Exascale Computing - SPPEXA %S Lecture Notes in Computer Science and Engineering %I Springer International Publishing %V 113 %P 527–545 %8 2016-09 %G eng %R 10.1007/978-3-319-40528-5_24 %0 Journal Article %J International Journal of High Performance Computing %D 2011 %T The International Exascale Software Project Roadmap %A Jack Dongarra %A Pete Beckman %A Terry Moore %A Patrick Aerts %A Giovanni Aloisio %A Jean-Claude Andre %A David Barkai %A Jean-Yves Berthou %A Taisuke Boku %A Bertrand Braunschweig %A Franck Cappello %A Barbara Chapman %A Xuebin Chi %A Alok Choudhary %A Sudip Dosanjh %A Thom Dunning %A Sandro Fiore %A Al Geist %A Bill Gropp %A Robert Harrison %A Mark Hereld %A Michael Heroux %A Adolfy Hoisie %A Koh Hotta %A Zhong Jin %A Yutaka Ishikawa %A Fred Johnson %A Sanjay Kale %A Richard Kenway %A David Keyes %A Bill Kramer %A Jesus Labarta %A Alain Lichnewsky %A Thomas Lippert %A Bob Lucas %A Barney MacCabe %A Satoshi Matsuoka %A Paul Messina %A Peter Michielse %A Bernd Mohr %A Matthias S. Mueller %A Wolfgang E. Nagel %A Hiroshi Nakashima %A Michael E. Papka %A Dan Reed %A Mitsuhisa Sato %A Ed Seidel %A John Shalf %A David Skinner %A Marc Snir %A Thomas Sterling %A Rick Stevens %A Fred Streitz %A Bob Sugar %A Shinji Sumimoto %A William Tang %A John Taylor %A Rajeev Thakur %A Anne Trefethen %A Mateo Valero %A Aad van der Steen %A Jeffrey Vetter %A Peg Williams %A Robert Wisniewski %A Kathy Yelick %X Over the last 20 years, the open-source community has provided more and more software on which the world’s high-performance computing systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. However, although the investments in these separate software elements have been tremendously valuable, a great deal of productivity has also been lost because of the lack of planning, coordination, and key integration of technologies necessary to make them work together smoothly and efficiently, both within individual petascale systems and between different systems. It seems clear that this completely uncoordinated development model will not provide the software needed to support the unprecedented parallelism required for peta/ exascale computation on millions of cores, or the flexibility required to exploit new hardware models and features, such as transactional memory, speculative execution, and graphics processing units. This report describes the work of the community to prepare for the challenges of exascale computing, ultimately combing their efforts in a coordinated International Exascale Software Project. %B International Journal of High Performance Computing %V 25 %P 3-60 %8 2011-01 %G eng %R https://doi.org/10.1177/1094342010391989 %0 Journal Article %J International Journal of High Performance Computing Applications (to appear) %D 2010 %T Trace-based Performance Analysis for the Petascale Simulation Code FLASH %A Heike Jagode %A Andreas Knuepfer %A Jack Dongarra %A Matthias Jurenz %A Matthias S. Mueller %A Wolfgang E. Nagel %B International Journal of High Performance Computing Applications (to appear) %8 2010-00 %G eng %0 Journal Article %J ISC'09 %D 2009 %T I/O Performance Analysis for the Petascale Simulation Code FLASH %A Heike Jagode %A Shirley Moore %A Dan Terpstra %A Jack Dongarra %A Andreas Knuepfer %A Matthias Jurenz %A Matthias S. Mueller %A Wolfgang E. Nagel %K test %B ISC'09 %C Hamburg, Germany %8 2009-06 %G eng %0 Generic %D 2009 %T Trace-based Performance Analysis for the Petascale Simulation Code FLASH %A Heike Jagode %A Andreas Knuepfer %A Jack Dongarra %A Matthias Jurenz %A Matthias S. Mueller %A Wolfgang E. Nagel %K test %B Innovative Computing Laboratory Technical Report %8 2009-04 %G eng