Publications
A High-Performance, Heterogeneous MPI,”
HeteroPar 2006, Barcelona, Spain, September 2006.
(193.73 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High-Performance Conjugate-Gradient Benchmark: A New Metric for Ranking High-Performance Computing Systems,”
The International Journal of High Performance Computing Applications, 2015.
(336.19 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing,”
IEEE Transactions on Computers, vol. 58, issue 11, pp. 1512-1524, November 2009.
(1.81 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance RDMA Protocols in HPC,”
Euro PVM/MPI 2006, Bonn, Germany, September 2006.
(1.06 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance GridRPC Middleware,”
Recent developments in Grid Technology and Applications: Nova Science Publishers, 00 2008.
(923.06 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Development for High End Computing with Python Language Wrapper (PLW),”
International Journal of High Performance Computing Applications (to appear), 00 2006.
(179.32 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Development for High End Computing with Python Language Wrapper (PLW),”
International Journal for High Performance Computer Applications, vol. 21, no. 3, pp. 360-369, 00 2007.
(179.32 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Dense Linear System Solver with Soft Error Resilience,”
IEEE Cluster 2011, Austin, TX, September 2011.
(1.27 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Dense Linear System Solver with Resilience to Multiple Soft Errors,”
ICCS 2012, Omaha, NE, June 2012.
(1.27 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Conjugate Gradient Benchmark: A new Metric for Ranking High Performance Computing Systems,”
International Journal of High Performance Computing Applications, vol. 30, issue 1, pp. 3 - 10, February 2016.
(277.51 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Computing Trends,”
HERMIS, vol. 2, pp. 155-163, November 2001.
“High Performance Computing Systems: Status and Outlook,”
Acta Numerica, vol. 21, Cambridge, UK, Cambridge University Press, pp. 379-474, May 2012.
(1.48 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
High Performance Computing for Computational Science,”
Lecture Notes in Computer Science, vol. 2565, VECPAR 2002, 5th International Conference June 26-28, 2002, Springer-Verlag, Berlin, January 2003.
“High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,”
ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 3, no. 16, 2013.
(665.7 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters,”
IPDPS 2012 (Best Paper), Shanghai, China, May 2012.
(165.9 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems,”
Parallel Computing, vol. 39, issue 4-5, pp. 212-232, May 2013.
(1.43 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
HARNESS Fault Tolerant MPI Design, Usage and Performance Issues,”
Future Generation Computer Systems, vol. 18, no. 8, pp. 1127-1142, January 2002.
(403.41 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
HARNESS and Fault Tolerant MPI,”
Parallel Computing, vol. 27, no. 11, pp. 1479-1496, January 2001.
(164.2 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
HARNESS: A Next Generation Distributed Virtual Machine,”
International Journal on Future Generation Computer Systems, vol. 15, no. 5-6, pp. 571-582, January 1999.
(183.78 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Hardware-Counter Based Automatic Performance Analysis of Parallel Programs,”
Advances in Parallel Computing, vol. 13, Dresden, Germany, Elsevier, pp. 753-760, January 2004, 2003.
“A Guide for Achieving High Performance with Very Small Matrices on GPUs: A Case Study of Batched LU and Cholesky Factorizations,”
IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 5, pp. 973–984, May 2018.
(832.92 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
GrADSolve - A Grid-based RPC System for Remote Invocation of Parallel Software,”
Journal of Parallel and Distributed Computing (submitted), March 2003.
(241.3 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The GrADS Project: Software Support for High-Level Grid Application Development,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 327-344, January 2001.
(271.52 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement,”
EuroPar 2012 (also LAWN 260), Rhodes Island, Greece, August 2012.
(662.98 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
GPU algorithms for Efficient Exascale Discretizations,”
Parallel Computing, vol. 108, pp. 102841, 2021.
“Ginkgo—A math library designed for platform portability,”
Parallel Computing, vol. 111, pp. 102902, February 2022.
“Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing,”
ACM Transactions on Mathematical Software, vol. 48, issue 12, pp. 1 - 33, March 2022.
(4.2 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Ginkgo: A High Performance Numerical Linear Algebra Library,”
Journal of Open Source Software, vol. 5, issue 52, August 2020.
(721.84 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Generic Approach to Scheduling and Checkpointing Workflows,”
Int. Journal of High Performance Computing Applications, vol. 33, no. 6, pp. 1255-1274, 2019.
(555.01 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Generic Approach to Scheduling and Checkpointing Workflows,”
International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1255-1274, November 2019.
(555.01 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The Future of Supercomputing: An Interim Report,”
National Research Council, Washington, D.C., The National Academies Press, January 2003.
“FT-MPI, Fault-Tolerant Metacomputing and Generic Name Services: A Case Study,”
Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-14: Springer Berlin / Heidelberg, pp. 133-140, 00 2006.
(362.44 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming,”
Parallel Computing, vol. 38, no. 8, pp. 391-407, August 2012.
(1.64 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Framework for Out of Memory SVD Algorithms,”
ISC High Performance 2017, pp. 158–178, June 2017.
(393.22 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Flexible collective communication tuning architecture applied to Open MPI,”
2006 Euro PVM/MPI (submitted), Bonn, Germany, January 2006.
(206.58 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Fine-grained Bit-Flip Protection for Relaxation Methods,”
Journal of Computational Science, November 2016.
(1.47 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Fault Tolerance of MPI Applications in Exascale Systems: The ULFM Solution,”
Future Generation Computer Systems, vol. 106, pp. 467-481, May 2020.
(2.06 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Fast Cholesky Factorization on GPUs for Batch and Native Modes in MAGMA,”
Journal of Computational Science, vol. 20, pp. 85–93, May 2017.
(3.6 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
A Failure Detector for HPC Platforms,”
The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018.
(1.04 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures,”
Procedia Computer Science, vol. 108, pp. 606–615, June 2017.
(643.44 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI,”
Concurrency and Computation: Practice and Experience, July 2013.
(3.89 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy,”
University of Tennessee Computer Science Tech Report, no. UT-CS-06-574, LAPACK Working Note #175, April 2006.
(221.39 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,”
in High Performance Computing and Grids in Action, Amsterdam, IOS Press, January 2008.
(92.95 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations,”
In High Performance Computing and Grids in Action (to appear), Amsterdam, IOS Press, 00 2007.
(122.01 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems,”
IEEE Access, 2021.
(1.35 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Experiences with Windows 95/NT as a Cluster Computing Platform for Parallel Computing,”
Parallel and Distributed Computing Practices, Special Issue: Cluster Computing, vol. 2, no. 2: Nova Science Publishers, USA, pp. 119-128, February 1999.
(164.04 KB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Experiences in Autotuning Matrix Multiplication for Energy Minimization on GPUs,”
Concurrency and Computation: Practice and Experience, vol. 27, issue 17, pp. 5096 - 5113, Oct 12, 2015.
(1.99 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
Experiences in autotuning matrix multiplication for energy minimization on GPUs,”
Concurrency in Computation: Practice and Experience, vol. 27, issue 17, pp. 5096-5113, December 2015.
(1.98 MB)
“![application/pdf](/modules/file/icons/application-pdf.png)
The evolution of mathematical software,”
Communications of the ACM, vol. 65227, issue 12, pp. 66 - 72, December 2022.
“Evaluations of molecular modeling and machine learning for predictive capabilities in binding of lanthanum and actinium with carboxylic acids,”
Journal of Radioanalytical and Nuclear Chemistry, December 2022.
“