%0 Journal Article %J IEEE Transactions on Parallel and Distributed Systems %D 2022 %T Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC %A Abdulah, Sameh %A Qinglei Cao %A Pei, Yu %A George Bosilca %A Jack Dongarra %A Genton, Marc G. %A Keyes, David E. %A Ltaief, Hatem %A Sun, Ying %K Computational modeling %K Covariance matrices %K Data models %K Maximum likelihood estimation %K Predictive models %K runtime %K Task analysis %X Geostatistical modeling, one of the prime motivating applications for exascale computing, is a technique for predicting desired quantities from geographically distributed data, based on statistical models and optimization of parameters. Spatial data are assumed to possess properties of stationarity or non-stationarity via a kernel fitted to a covariance matrix. A primary workhorse of stationary spatial statistics is Gaussian maximum log-likelihood estimation (MLE), whose central data structure is a dense, symmetric positive definite covariance matrix of the dimension of the number of correlated observations. Two essential operations in MLE are the application of the inverse and evaluation of the determinant of the covariance matrix. These can be rendered through the Cholesky decomposition and triangular solution. In this contribution, we reduce the precision of weakly correlated locations to single- or half- precision based on distance. We thus exploit mathematical structure to migrate MLE to a three-precision approximation that takes advantage of contemporary architectures offering BLAS3-like operations in a single instruction that are extremely fast for reduced precision. We illustrate application-expected accuracy worthy of double-precision from a majority half-precision computation, in a context where uniform single-precision is by itself insufficient. In tackling the complexity and imbalance caused by the mixing of three precisions, we deploy the PaRSEC runtime system. PaRSEC delivers on-demand casting of precisions while orchestrating tasks and data movement in a multi-GPU distributed-memory environment within a tile-based Cholesky factorization. Application-expected accuracy is maintained while achieving up to 1.59X by mixing FP64/FP32 operations on 1536 nodes of HAWK or 4096 nodes of Shaheen II , and up to 2.64X by mixing FP64/FP32/FP16 operations on 128 nodes of Summit , relative to FP64-only operations. This translates into up to 4.5, 4.7, ... %B IEEE Transactions on Parallel and Distributed Systems %V 33 %P 964 - 976 %8 2022-04 %G eng %U https://ieeexplore.ieee.org/document/9442267/https://ieeexplore.ieee.org/ielam/71/9575177/9442267-aam.pdfhttp://xplorestaging.ieee.org/ielx7/71/9575177/09442267.pdf?arnumber=9442267 %N 4 %! IEEE Trans. Parallel Distrib. Syst. %R 10.1109/TPDS.2021.3084071 %0 Conference Proceedings %B 2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22) %D 2022 %T Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications %A Cao, Qinglei %A Abdulah, Sameh %A Rabab Alomairy %A Pei, Yu %A Pratik Nag %A George Bosilca %A Dongarra, Jack %A Genton, Marc G. %A Keyes, David %A Ltaief, Hatem %A Sun, Ying %K climate/weather prediction %K dynamic runtime systems %K high performance computing. %K low- rank matrix approximations %K mixed-precision computations %K space-time geospatial statistics %K Task-based programming models %X We extend the capability of space-time geostatistical modeling using algebraic approximations, illustrating application-expected accuracy worthy of double precision from majority low-precision computations and low-rank matrix approximations. We exploit the mathematical structure of the dense covariance matrix whose inverse action and determinant are repeatedly required in Gaussian log-likelihood optimization. Geostatistics augments first-principles modeling approaches for the prediction of environmental phenomena given the availability of measurements at a large number of locations; however, traditional Cholesky-based approaches grow cubically in complexity, gating practical extension to continental and global datasets now available. We combine the linear algebraic contributions of mixed-precision and low-rank computations within a tilebased Cholesky solver with on-demand casting of precisions and dynamic runtime support from PaRSEC to orchestrate tasks and data movement. Our adaptive approach scales on various systems and leverages the Fujitsu A64FX nodes of Fugaku to achieve up to 12X performance speedup against the highly optimized dense Cholesky implementation. %B 2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22) %I IEEE Press %C Dallas, TX %8 2022-11 %@ 9784665454445 %G eng %U https://dl.acm.org/doi/abs/10.5555/3571885.3571888