Publications
Task Based Cholesky Decomposition on Xeon Phi Architectures using OpenMP,”
International Journal of Computational Science and Engineering (IJCSE), vol. 17, no. 3, October 2018.
DOI: http://dx.doi.org/10.1504/IJCSE.2018.095851
“Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems,”
Parallel Computing, vol. 36, no. 5-6, pp. 232-240, 00 2010.
(606.41 KB)
“Translational Process: Mathematical Software Perspective,”
Journal of Computational Science, September 2020.
DOI: 10.1016/j.jocs.2020.101216 (752.59 KB)
“Translational process: Mathematical software perspective,”
Journal of Computational Science, vol. 52, pp. 101216, 2021.
DOI: 10.1016/j.jocs.2020.101216
“Tuning Principal Component Analysis for GRASS GIS on Multi-core and GPU Architectures,”
FOSS4G 2010, Barcelona, Spain, September 2010.
(1.57 MB)
“Using Jacobi Iterations and Blocking for Solving Sparse Triangular Systems in Incomplete Factorization Preconditioning,”
Journal of Parallel and Distributed Computing, vol. 119, pp. 219–230, November 2018.
DOI: 10.1016/j.jpdc.2018.04.017 (273.53 KB)
“Using long vector extensions for MPI reductions,”
Parallel Computing, vol. 109, pp. 102871, March 2022.
DOI: 10.1016/j.parco.2021.102871
“Using MAGMA with PGI Fortran,”
PGI Insider, November 2010.
(176.67 KB)
“Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy,”
ACM Transactions on Mathematical Software, vol. 34, no. 4, pp. 17-22, 00 2008.
(364.48 KB)
“Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,”
Parallel Computing, vol. 81, pp. 131-146, January 2019.
DOI: 10.1016/j.parco.2017.12.006 (1.9 MB)
“With Extreme Computing, the Rules Have Changed,”
Computing in Science & Engineering, vol. 19, issue 3, pp. 52-62, May 2017.
DOI: 10.1109/MCSE.2017.48 (485.34 KB)
“Accelerating FFT towards Exascale Computing
: NVIDIA GPU Technology Conference (GTC2021), 2021.
(27.23 MB)
Accelerating Tensor Contractions for High-Order FEM on CPUs, GPUs, and KNLs
, Gatlinburg, TN, moky Mountains Computational Sciences and Engineering Conference (SMC16), Poster, September 2016.
(4.29 MB)
Cholesky Factorization on Batches of Matrices with Fixed and Variable Sizes
, San Jose, CA, GPU Technology Conference (GTC16), Poster, April 2016.
(480.51 KB)
Clover: Computational Libraries Optimized via Exascale Research
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(872 KB)
DTE: PaRSEC Enabled Libraries and Applications
: 2021 Exascale Computing Project Annual Meeting, April 2021.
(3.24 MB)
DTE: PaRSEC Enabled Libraries and Applications (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(979.27 KB)
DTE: PaRSEC Systems and Interfaces (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(840.54 KB)
Enhancing the Performance of Dense Linear Algebra Solvers on GPUs (in the MAGMA Project)
, Austin, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC08), November 2008.
(5.28 MB)
Exa-PAPI: The Exascale Performance API with Modern C++
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(556.78 KB)
FFT-ECP Fast Fourier Transform
, Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.
(1.51 MB)
Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(699 KB)
GPUDirect MPI Communications and Optimizations to Accelerate FFTs on Exascale Systems,”
EuroMPI'19 Posters, Zurich, Switzerland, no. icl-ut-19-06: ICL, September 2019.
(2.25 MB)
“Harnessing GPU's Tensor Cores Fast FP16 Arithmetic to Speedup Mixed-Precision Iterative Refinement Solvers and Achieve 74 Gflops/Watt on Nvidia V100
, San Jose, CA, GPU Technology Conference (GTC), Poster, March 2018.
(2.96 MB)
heFFTe: Highly Efficient FFT for Exascale (Poster)
: NVIDIA GPU Technology Conference (GTC2020), October 2020.
(866.88 KB)
heFFTe: Highly Efficient FFT for Exascale (Poster)
, Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
(1.54 MB)