Publications
Sampling Algorithms to Update Truncated SVD,”
IEEE International Conference on Big Data, Boston, MA, IEEE, December 2017.
(700.79 KB)
“Scalable Data Generation for Evaluating Mixed-Precision Solvers,”
2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, IEEE, September 2020.
DOI: 10.1109/HPEC43674.2020.9286145 (1.3 MB)
“Scaling Point Set Registration in 3D Across Thread Counts on Multicore and Hardware Accelerator Platforms through Autotuning for Large Scale Analysis of Scientific Point Clouds,”
IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2017), Boston, MA, IEEE, December 2017.
DOI: 10.1109/BigData.2017.8258258 (6.71 MB)
“Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures,”
VECPAR 2014, Eugene, OR, June 2014.
(430.56 KB)
“SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library,”
International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, ACM, November 2019.
DOI: 10.1145/3295500.3356223 (2.01 MB)
“Software-Defined Events through PAPI,”
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.
DOI: 10.1109/IPDPSW.2019.00069 (446.41 KB)
“Threshold Pivoting for Dense LU Factorization,”
ScalAH22: 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems , Dallas, Texas, IEEE, November 2022.
DOI: 10.1109/ScalAH56622.2022.00010 (721.77 KB)
“Towards Continuous Benchmarking,”
Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019.
DOI: 10.1145/3324989.3325719 (1.51 MB)
“Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs,”
ScalA19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.
(523.87 KB) (3.42 MB)
“Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic,”
2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, IEEE, September 2017.
DOI: 10.1109/HPEC.2017.8091031 (1.67 MB)
“Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster,”
The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2013.
“Twenty Years of Computational Science,”
International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020.
(149.66 KB)
“Using Additive Modifications in LU Factorization Instead of Pivoting,”
37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023.
DOI: 10.1145/3577193.3593731 (624.18 KB)
“Using Advanced Vector Extensions AVX-512 for MPI Reduction,”
EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020.
DOI: 10.1145/3416315.3416316 (634.45 KB)
“Using Arm Scalable Vector Extension to Optimize Open MPI,”
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020.
DOI: 10.1109/CCGrid49817.2020.00-71 (359.95 KB)
“Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,”
ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.
(3.01 MB)
“Using PAPI for Hardware Performance Monitoring on Linux Systems,”
Conference on Linux Clusters: The HPC Revolution, Urbana, Illinois, Linux Clusters Institute, June 2001.
(422.35 KB)
“Utilizing Dataflow-based Execution for Coupled Cluster Methods,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.
(260.23 KB)
“Variable-Size Batched Condition Number Calculation on GPUs,”
SBAC-PAD, Lyon, France, September 2018.
(509.3 KB)
“Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning,”
46th International Conference on Parallel Processing (ICPP), Bristol, United Kingdom, IEEE, August 2017.
DOI: 10.1109/ICPP.2017.18
“Visualizing Execution Traces with Task Dependencies,”
2nd Workshop on Visual Performance Analysis (VPA '15), Austin, TX, ACM, November 2015.
(927.5 KB)
“What it Takes to keep PAPI Instrumental for the HPC Community,”
1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019.
(50.57 KB)
“Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022.
(1.57 MB)
“