Publications
Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,”
Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020.
(2.24 MB)
“Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,”
Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015.
(5.06 MB)
“Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,”
Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014.
(1.86 MB)
“MPI Collective Algorithm Selection and Quadtree Encoding,”
Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-13: Springer Berlin / Heidelberg, pp. 40-48, September 2006.
(308.39 KB)
“MPI Collective Algorithm Selection and Quadtree Encoding,”
Parallel Computing (Special Edition: EuroPVM/MPI 2006): Elsevier, 00 2007.
(308.39 KB)
“Multi-GPU work sharing in a task-based dataflow programming model,”
Future Generation Computer Systems, vol. 156, pp. 313 - 324, July 2024.
“Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,”
Journal of Computational Science, vol. 28, pp. 398–415, September 2018.
“Multithreading in the PLASMA Library,”
Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.
(536.28 KB)
“NanoPSE: A Nanoscience Problem Solving Environment for Atomistic Electronic Structure of Semiconductor Nanostructures,”
Journal of Physics: Conference Series, issue 16, pp. 277-282, June 2005.
(476.64 KB)
“National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community,”
D-Lib Magazine, January 1998.
(56.15 KB)
“NetBuild: Transparent Cross-Platform Access to Computational Software Libraries,”
Concurrency and Computation: Practice and Experience, Special Issue: Grid Computing Environments, vol. 14, no. 13-15, pp. 1445-1456, November 2002.
(74.84 KB)
“Netlib and NA-Net: building a scientific computing community,”
In IEEE Annals of the History of Computing (to appear), August 2007.
(352.71 KB)
“Netlib and NA-Net: Building a Scientific Computing Community,”
IEEE Annals of the History of Computing, vol. 30, no. 2, pp. 30-41, January 2008.
(352.71 KB)
“NetSolve: Grid Enabling Scientific Computing Environments,”
Grid Computing and New Frontiers of High Performance Processing, no. 14: Elsevier, 00 2005.
(425 KB)
“NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server,”
Making the Global Infrastructure a Reality: Wiley Publishing, 00 2003.
(158.19 KB)
“Network-Enabled Solvers: A Step Toward Grid-Based Computing,”
SIAM News, vol. 34, no. 10, December 2001.
“New Grid Scheduling and Rescheduling Methods in the GrADS Project,”
International Journal of Parallel Programming, vol. 33, no. 2: Springer, pp. 209-229, June 2005.
(306.41 KB)
“A New Metric for Ranking High-Performance Computing Systems,”
National Science Review, vol. 3, issue 1, pp. 30-35, January 2016.
(393.55 KB)
“Non-GPU-resident Dense Symmetric Indefinite Factorization,”
Concurrency and Computation: Practice and Experience, November 2016.
“A Not So Simple Matter of Software,”
NCSA Access Online: NCSA, 00 2005.
(457.69 KB)
“A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,”
Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.
“A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,”
International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014.
(1.74 MB)
“Numerical Algorithms for High-Performance Computational Science,”
Philosophical Transactions of the Royal Society A, vol. 378, issue 2166, 2020.
(724.37 KB)
“Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload,”
The International Journal of High Performance Computing Applications, vol. 303, issue 136, September 2024.
“Numerical Libraries and The Grid,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 359-374, January 2001.
(67.09 KB)
“Numerical Libraries and Tools for Scalable Parallel Cluster Computing,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 2, pp. 175-180, October 2002.
(37.38 KB)
“Numerical Libraries and Tools for Scalable Parallel Cluster Computing,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 2, pp. 175-180, January 2001.
(37.38 KB)
“Numerical Libraries and Tools for Scalable Parallel Cluster Computing,”
IEEE Cluster Computing BOF at SC99, Portland, Oregon, January 1999.
(37.38 KB)
“Numerical Linear Algebra,”
Encyclopedia of Computer Science and Technology, eds. Kent, A., Williams, J., vol. 41, pp. 207-233, August 1999.
(262 KB)
“Numerical Linear Algebra Algorithms and Software,”
Journal of Computational and Applied Mathematics, vol. 123, no. 1-2, pp. 489-514, October 1999.
(258.62 KB)
“A Numerical Linear Algebra Problem Solving Environment Designer's Perspective (LAPACK Working Note 139),”
SIAM Annual Meeting, Atlanta, GA, May 1999.
(319.71 KB)
“OMPIO: A Modular Software Architecture for MPI I/O,”
18th EuroMPI, Santorini, Greece, Springer, pp. 81-89, September 2011.
“OpenMP application experiences: Porting to accelerated nodes,”
Parallel Computing, vol. 109, March 2022.
“Optimal Checkpointing Strategies for Iterative Applications,”
IEEE Transactions on Parallel Distributed Systems, vol. 33, issue 3, pp. 507-522, March 2022.
(1.47 MB)
“Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,”
The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018.
(2.08 MB)
“Optimization Problem Solving System Using GridRPC,”
IEEE Transactions on Parallel and Distributed Systems (submitted), January 2005.
(740.57 KB)
“Optimization System Using Grid RPC,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
“Optimizing Matrix Multiplication for a Short-Vector SIMD Architecture - CELL Processor,”
Parallel Computing, vol. 35, pp. 138-150, 00 2009.
(591.16 KB)
“Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators,”
VECPAR 2012, Kobe, Japan, July 2012.
(737.28 KB)
“Overhead of Using Spare Nodes,”
The International Journal of High Performance Computing Applications, February 2020.
(2.15 MB)
“An Overview of Heterogeneous High Performance and Grid Computing,”
Engineering the Grid (to appear): Nova Science Publishers, Inc., 00 2004.
(199.93 KB)
“Overview of High Performance Computers,”
Handbook of Massive Data Sets: Kluwer Academic Publishers, pp. 791-852, January 2001.
(442.71 KB)
“PAPI Software-Defined Events for in-Depth Performance Analysis,”
The International Journal of High Performance Computing Applications, vol. 33, issue 6, pp. 1113-1127, November 2019.
(442.39 KB)
“PAPI-V: Performance Monitoring for Virtual Machines,”
CloudTech-HPC 2012, Pittsburgh, PA, September 2012.
(2.69 MB)
“Parallel algebraic domain decomposition solver for the solution of augmented systems.,”
Parallel, Distributed, Grid and Cloud Computing for Engineering, Ajaccio, Corsica, France, 12-15 April, 00 2011.
“Parallel and Distributed Scientific Computing: A Numerical Linear Algebra Problem Solving Environment Designer's Perspective,”
Handbook on Parallel and Distributed Processing, January 1999.
(323.01 KB)
“Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,”
IEEE Transactions on Parallel and Distributed Systems, pp. 417-423, April 2010.
(208.16 KB)
“Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,”
IEEE Transactions on Parallel and Distributed Systems (to appear), May 2009.
(208.16 KB)
“Parallel Dense Linear Algebra Software in the Multicore Era,”
in Cyberinfrastructure Technologies and Applications: Nova Science Publishers, Inc., pp. 9-24, 00 2009.
“A Parallel Implementation of the Nonsymmetric QR Algorithm for Disitributed Memory Architectures,”
SIAM Journal on Scientific Computing, vol. 16, no. 2, pp. 284-311, October 2002.
(224.7 KB)
“