Publications
Using PAPI for Hardware Performance Monitoring on Linux Systems,”
Conference on Linux Clusters: The HPC Revolution, Urbana, Illinois, Linux Clusters Institute, June 2001.
(422.35 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Using Quantized Integer in LU Factorization with Partial Pivoting (Poster)
, Seattle, WA, SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP20), February 2020.
(6.65 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Using Software-Based Performance Counters to Expose Low-Level Open MPI Performance Information,”
EuroMPI, Chicago, IL, ACM, September 2017.
(745.58 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Utilizing Dataflow-based Execution for Coupled Cluster Methods,”
2014 IEEE International Conference on Cluster Computing, no. ICL-UT-14-02, Madrid, Spain, IEEE, September 2014.
(260.23 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Variable-Size Batched Condition Number Calculation on GPUs,”
SBAC-PAD, Lyon, France, September 2018.
(509.3 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning,”
International Conference on Computational Science (ICCS 2017), vol. 108, Zurich, Switzerland, Procedia Computer Science, pp. 1783-1792, June 2017.
(512.57 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Variable-Size Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioning on Graphics Processors,”
Parallel Computing, vol. 81, pp. 131-146, January 2019.
(1.9 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning,”
46th International Conference on Parallel Processing (ICPP), Bristol, United Kingdom, IEEE, August 2017.
“VGrADS: Enabling e-Science Workflows on Grids and Clouds with Fault Tolerance,”
SC’09 The International Conference for High Performance Computing, Networking, Storage and Analysis (to appear), Portland, OR, 00 2009.
(648.82 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Virtual Instrument: Support for Grid-enabled Scientific Simulations,”
Journal of Parallel and Distributed Computing (submitted), October 2002.
(282.16 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Virtual Instrument: Support for Grid-enabled Scientific Simulations,”
International Journal of High Performance Computing Applications, vol. 18, no. 1, pp. 3-17, January 2004.
(282.16 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Virtual Systolic Array for QR Decomposition,”
15th Workshop on Advances in Parallel and Distributed Computational Models, IEEE International Parallel & Distributed Processing Symposium (IPDPS 2013), Boston, MA, IEEE, May 2013.
(749.84 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
VisPerf: Monitoring Tool for Grid Computing,”
Lecture Notes in Computer Science, vol. 2659: Springer Verlag, Heidelberg, pp. 233-243, 00 2003.
(835.09 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Visualizing Execution Traces with Task Dependencies,”
2nd Workshop on Visual Performance Analysis (VPA '15), Austin, TX, ACM, November 2015.
(927.5 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Visualizing the Program Execution Control Flow of OpenMP Applications,”
Proc. 4th International Workshop on OpenMP (IWOMP 2008), West Lafayette, Indiana, Lecture Notes in Computer Science 5004, pp. 181-190, January 2008.
(194.25 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems,”
Tenth International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (Best Paper), Rhodes Island, Greece, August 2012.
(764.02 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Weighted Block-Asynchronous Relaxation for GPU-Accelerated Systems,”
SIAM Journal on Computing (submitted), March 2012.
(811.01 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Weighted Dynamic Scheduling with Many Parallelism Grains for Offloading of Numerical Workloads to Multiple Varied Accelerators,”
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA'15), vol. No. 5, Austin, TX, ACM, November 2015.
(347.6 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The 'Weighted Modification' Incomplete Factorisation Method,”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-436, December 1999.
(198.71 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
What it Takes to keep PAPI Instrumental for the HPC Community,”
1st Workshop on Sustainable Scientific Software (CW3S19), Collegeville, Minnesota, July 2019.
(50.57 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
What it Takes to keep PAPI Instrumental for the HPC Community
, Collegeville, MN, The 2019 Collegeville Workshop on Sustainable Scientific Software (CW3S19), July 2019.
(3.29 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
When to checkpoint at the end of a fixed-length reservation?,”
Fault Tolerance for HPC at eXtreme Scales (FTXS) Workshop, Denver, United States, August 2023.
“With Extreme Computing, the Rules Have Changed,”
Computing in Science & Engineering, vol. 19, issue 3, pp. 52-62, May 2017.
(485.34 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
: arXiv, January 2024.
xSDK4ECP: Extreme-scale Scientific Software Development Kit for ECP (Poster)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(1.54 MB)
![application/pdf](/modules/file/icons/application-pdf.png)
Is your scheduling good? How would you know?
, Bordeaux, France, 14th Scheduling for Large Scale Systems Workshop, June 2019.
(2.5 MB)
![application/pdf](/modules/file/icons/application-pdf.png)