Publications
Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,”
IPDPS 2006, 20th IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, January 2006.
(266.54 KB)
“Algebraic Schwarz Preconditioning for the Schur Complement: Application to the Time-Harmonic Maxwell Equations Discretized by a Discontinuous Galerkin Method.,”
The Twentieth International Conference on Domain Decomposition Methods, La Jolla, California, February 2011.
“An Algebra for Cross-Experiment Performance Analysis,”
2004 International Conference on Parallel Processing (ICCP-04), Montreal, Quebec, Canada, August 2004.
(166.12 KB)
“Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers,”
2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Computer Society, pp. 354-367, November 2022.
(1.57 MB)
“Active Logistical State Management in the GridSolve/L,”
4th International Symposium on Cluster Computing and the Grid (CCGrid 2004)(submitted), Chicago, Illinois, January 2004.
(123.69 KB)
““8th International Conference on Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science (LNCS),”
PPAM 2009 Proceedings, vol. 6067, Wroclaw, Poland, Springer, September 2010.
3-D parallel frequency-domain visco-acoustic wave modelling based on a hybrid direct/iterative solver,”
73rd EAGE Conference & Exhibition incorporating SPE EUROPEC 2011, Vienna, Austria, 23-26 May, 00 2011.
“When to checkpoint at the end of a fixed-length reservation?,”
Fault Tolerance for HPC at eXtreme Scales (FTXS) Workshop, Denver, United States, August 2023.
“What it Takes to keep PAPI Instrumental for the HPC Community,”
1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019.
(50.57 KB)
“Visualizing Execution Traces with Task Dependencies,”
2nd Workshop on Visual Performance Analysis (VPA '15), Austin, TX, ACM, November 2015.
(927.5 KB)
“Virtual Systolic Array for QR Decomposition,”
15th Workshop on Advances in Parallel and Distributed Computational Models, IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), Boston, MA, IEEE, May 2013.
(749.84 KB)
“Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning,”
46th International Conference on Parallel Processing (ICPP), Bristol, United Kingdom, IEEE, August 2017.
“Variable-Size Batched Condition Number Calculation on GPUs,”
SBAC-PAD, Lyon, France, September 2018.
(509.3 KB)
“Utilizing Dataflow-based Execution for Coupled Cluster Methods,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.
(260.23 KB)
“Using Software-Based Performance Counters to Expose Low-Level Open MPI Performance Information,”
EuroMPI, Chicago, IL, ACM, September 2017.
(745.58 KB)
“Using PAPI for Hardware Performance Monitoring on Linux Systems,”
Conference on Linux Clusters: The HPC Revolution, Urbana, Illinois, Linux Clusters Institute, June 2001.
(422.35 KB)
“Using GPU FP16 Tensor Cores Arithmetic to Accelerate Mixed-Precision Iterative Refinement Solvers and Reduce Energy Consumption,”
ISC High Performance (ISC'18), Best Poster, Frankfurt, Germany, June 2018.
(3.01 MB)
“Using Arm Scalable Vector Extension to Optimize Open MPI,”
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID 2020), Melbourne, Australia, IEEE/ACM, May 2020.
(359.95 KB)
“Using Advanced Vector Extensions AVX-512 for MPI Reduction,”
EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Austin, TX, September 2020.
(634.45 KB)
“Using Additive Modifications in LU Factorization Instead of Pivoting,”
37th ACM International Conference on Supercomputing (ICS'23), Orlando, FL, ACM, June 2023.
(624.18 KB)
“Unified Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment,”
IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(1.51 MB)
“Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training,”
2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), Denver, CO, IEEE, November 2019.
(696.89 KB)
“Twenty Years of Computational Science,”
International Conference on Computational Science (ICCS 2020), Amsterdam, Netherlands, June 2020.
(149.66 KB)
“Tuning Stationary Iterative Solvers for Fault Resilience,”
6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA15), Austin, TX, ACM, November 2015.
(1.28 MB)
“Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster,”
The Third International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2013.
“Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring,”
2019 European Conference on Parallel Processing (Euro-Par 2019), Göttingen, Germany, Springer, August 2019.
(1.07 MB)
“Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic,”
2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, IEEE, September 2017.
(1.67 MB)
“Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs,”
ScalA19: 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Denver, CO, IEEE, November 2019.
(523.87 KB) (3.42 MB)
“Towards Continuous Benchmarking,”
Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019.
(1.51 MB)
“Towards Batched Linear Solvers on Accelerated Hardware Platforms,”
8th Workshop on General Purpose Processing Using GPUs (GPGPU 8) co-located with PPOPP 2015, San Francisco, CA, ACM, February 2015.
(403.74 KB)
“Towards Achieving Performance Portability Using Directives for Accelerators,”
The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.
(567.02 KB)
“Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,”
Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.
(1.27 MB)
“Threshold Pivoting for Dense LU Factorization,”
ScalAH22: 13th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems , Dallas, Texas, IEEE, November 2022.
(721.77 KB)
“The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale,”
2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2): IEEE, November 2020.
(139.6 KB)
“Task-Based Programming for Seismic Imaging: Preliminary Results,”
2014 IEEE International Conference on High Performance Computing and Communications (HPCC), Paris, France, IEEE, August 2014.
(625.86 KB)
“Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators,”
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.
“Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance,”
International Conference for High Performance Computing Networking, Storage, and Analysis (SC20): ACM, November 2020.
(644.92 KB)
“Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes,”
23rd International Heterogeneity in Computing Workshop, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(807.33 KB)
“Surrogate ML/AI Model Benchmarking for FAIR Principles' Conformance,”
2022 IEEE High Performance Extreme Computing Conference (HPEC): IEEE, September 2022.
“A Step towards Energy Efficient Computing: Redesigning A Hydrodynamic Application on CPU-GPU,”
IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(1.01 MB)
“Standards for Graph Algorithm Primitives,”
17th IEEE High Performance Extreme Computing Conference (HPEC '13), Waltham, MA, IEEE, September 2013.
(108.86 KB)
“Sparse Linear Algebra on AMD and NVIDIA GPUs—The Race is On,”
ISC High Performance: Springer, June 2020.
(5.63 MB)
“Software-Defined Events through PAPI,”
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, IEEE, May 2019.
(446.41 KB)
“SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library,”
International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, ACM, November 2019.
(2.01 MB)
“Self-Adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures,”
VECPAR 2014, Eugene, OR, June 2014.
(430.56 KB)
“Search Space Pruning Constraints Visualization,”
VISSOFT'14: 2nd IEEE Working Conference on Software Visualization, Victoria, BC, Canada, IEEE, September 2014.
(1.32 MB)
“Search Space Generation and Pruning System for Autotuners,”
30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.
(555.44 KB)
“Scheduling Independent Stochastic Tasks on Heterogeneous Cloud Platforms,”
IEEE Cluster 2019, Albuquerque, New Mexico, IEEE Computer Society Press, September 2019.
(651 KB)
“