|Load-balancing Sparse Matrix Vector Product Kernels on GPUs,”
ACM Transactions on Parallel Computing, issue 2, March 2020.
DOI: 10.1145/3380930 (5.64 MB)
|Parallel Selection on GPUs,”
Parallel Computing, November 2019.
DOI: 10.1016/j.parco.2019.102588 (1.43 MB)
|Toward a Modular Precision Ecosystem for High-Performance Computing,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1069-1078, November 2019.
DOI: 10.1177/1094342019846547 (1.93 MB)
|Towards a New Peer Review Concept for Scientific Computing ensuring Technical Quality, Software Sustainability, and Result Reproducibility,”
Proceedings in Applied Mathematics and Mechanics, vol. 19, issue 1, November 2019.
|Distributed-Memory Lattice H-Matrix Factorization,”
The International Journal of High Performance Computing Applications, vol. 33, issue 5, pp. 1046â1063, August 2019.
DOI: 10.1177/1094342019861139 (1.14 MB)
|Are we Doing the Right Thing? â A Critical Analysis of the Academic HPC Community,”
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPSW.2019.00122 (622.32 KB)
|Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,”
International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
|ParILUT â A Parallel Threshold ILU for GPUs,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPS.2019.00033 (505.95 KB)
|A Customized Precision Format Based on Mantissa Segmentation for Accelerating Sparse Linear Algebra,”
Concurrency and Computation: Practice and Experience, vol. 40319, issue 262, January 2019.
|Software-Defined Events (SDEs) in MAGMA-Sparse,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-12: University of Tennessee, December 2018.
|Variable-Size Batched Condition Number Calculation on GPUs,”
SBAC-PAD, Lyon, France, September 2018.
|ParILUT - A New Parallel Threshold ILU,”
SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503âC519, July 2018.
DOI: 10.1137/16M1079506 (19.26 MB)
|Solver Interface & Performance on Cori,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-05: University of Tennessee, June 2018.
|A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs,”
SBAC-PAD, Lyon, France, IEEE, 2018.
|High-Performance GPU Implementation of PageRank with Reduced Precision based on Mantissa Segmentation,”
8th Workshop on Irregular Applications: Architectures and Algorithms, 2018.
|MAGMA-sparse Interface Design Whitepaper,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.
|Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning,”
46th International Conference on Parallel Processing (ICPP), Bristol, United Kingdom, IEEE, August 2017.
|Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives,”
Proceedings of The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017), Best Paper Award, Orlando, FL, June 2017.
DOI: 10.1109/IPDPSW.2017.65 (453.66 KB)