Publications
Optimization System Using Grid RPC,”
Meeting of the Japan Society of Mechanical Engineers, Kyoto University, Kyoto, Japan, October 2002.
“Optimization Problem Solving System Using GridRPC,”
IEEE Transactions on Parallel and Distributed Systems (submitted), January 2005.
(740.57 KB)
“Optimization and Performance Evaluation of the IDR Iterative Krylov Solver on GPUs,”
The International Journal of High Performance Computing Applications, vol. 32, no. 2, pp. 220–230, March 2018.
DOI: 10.1177/1094342016646844 (2.08 MB)
“Optimal Checkpointing Strategies for Iterative Applications,”
IEEE Transactions on Parallel Distributed Systems, vol. 33, issue 3, pp. 507-522, March 2022.
DOI: 10.1109/TPDS.2021.3099440 (1.47 MB)
“OpenMP application experiences: Porting to accelerated nodes,”
Parallel Computing, vol. 109, March 2022.
DOI: 10.1016/j.parco.2021.102856
“OMPIO: A Modular Software Architecture for MPI I/O,”
18th EuroMPI, Santorini, Greece, Springer, pp. 81-89, September 2011.
“A Numerical Linear Algebra Problem Solving Environment Designer's Perspective (LAPACK Working Note 139),”
SIAM Annual Meeting, Atlanta, GA, May 1999.
(319.71 KB)
“Numerical Linear Algebra Algorithms and Software,”
Journal of Computational and Applied Mathematics, vol. 123, no. 1-2, pp. 489-514, October 1999.
(258.62 KB)
“Numerical Linear Algebra,”
Encyclopedia of Computer Science and Technology, eds. Kent, A., Williams, J., vol. 41, pp. 207-233, August 1999.
(262 KB)
“Numerical Libraries and Tools for Scalable Parallel Cluster Computing,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 2, pp. 175-180, January 2001.
(37.38 KB)
“Numerical Libraries and Tools for Scalable Parallel Cluster Computing,”
IEEE Cluster Computing BOF at SC99, Portland, Oregon, January 1999.
(37.38 KB)
“Numerical Libraries and Tools for Scalable Parallel Cluster Computing,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 2, pp. 175-180, October 2002.
(37.38 KB)
“Numerical Libraries and The Grid,”
International Journal of High Performance Applications and Supercomputing, vol. 15, no. 4, pp. 359-374, January 2001.
(67.09 KB)
“Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload,”
The International Journal of High Performance Computing Applications, vol. 303, issue 136, September 2024.
DOI: 10.1177/10943420241281050
“Numerical Algorithms for High-Performance Computational Science,”
Philosophical Transactions of the Royal Society A, vol. 378, issue 2166, 2020.
DOI: 10.1098/rsta.2019.0066 (724.37 KB)
“A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,”
Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.
“A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks,”
International Journal of High Performance Computing Applications, vol. 28, issue 2, pp. 196-209, May 2014.
DOI: 10.1177/1094342013502097 (1.74 MB)
“A Not So Simple Matter of Software,”
NCSA Access Online: NCSA, 00 2005.
(457.69 KB)
“Non-GPU-resident Dense Symmetric Indefinite Factorization,”
Concurrency and Computation: Practice and Experience, November 2016.
DOI: 10.1002/cpe.4012
“A New Metric for Ranking High-Performance Computing Systems,”
National Science Review, vol. 3, issue 1, pp. 30-35, January 2016.
DOI: 10.1093/nsr/nwv084 (393.55 KB)
“New Grid Scheduling and Rescheduling Methods in the GrADS Project,”
International Journal of Parallel Programming, vol. 33, no. 2: Springer, pp. 209-229, June 2005.
(306.41 KB)
“Network-Enabled Solvers: A Step Toward Grid-Based Computing,”
SIAM News, vol. 34, no. 10, December 2001.
“NetSolve: Past, Present, and Future - A Look at a Grid Enabled Server,”
Making the Global Infrastructure a Reality: Wiley Publishing, 00 2003.
(158.19 KB)
“NetSolve: Grid Enabling Scientific Computing Environments,”
Grid Computing and New Frontiers of High Performance Processing, no. 14: Elsevier, 00 2005.
(425 KB)
“Netlib and NA-Net: building a scientific computing community,”
In IEEE Annals of the History of Computing (to appear), August 2007.
(352.71 KB)
“Netlib and NA-Net: Building a Scientific Computing Community,”
IEEE Annals of the History of Computing, vol. 30, no. 2, pp. 30-41, January 2008.
(352.71 KB)
“NetBuild: Transparent Cross-Platform Access to Computational Software Libraries,”
Concurrency and Computation: Practice and Experience, Special Issue: Grid Computing Environments, vol. 14, no. 13-15, pp. 1445-1456, November 2002.
(74.84 KB)
“National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community,”
D-Lib Magazine, January 1998.
(56.15 KB)
“NanoPSE: A Nanoscience Problem Solving Environment for Atomistic Electronic Structure of Semiconductor Nanostructures,”
Journal of Physics: Conference Series, issue 16, pp. 277-282, June 2005.
DOI: 10.1088/1742-6596/16/1/038 (476.64 KB)
“Multithreading in the PLASMA Library,”
Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.
(536.28 KB)
“Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,”
Journal of Computational Science, vol. 28, pp. 398–415, September 2018.
“Multi-GPU work sharing in a task-based dataflow programming model,”
Future Generation Computer Systems, vol. 156, pp. 313 - 324, July 2024.
DOI: 10.1016/j.future.2024.03.017
“MPI Collective Algorithm Selection and Quadtree Encoding,”
Parallel Computing (Special Edition: EuroPVM/MPI 2006): Elsevier, 00 2007.
(308.39 KB)
“MPI Collective Algorithm Selection and Quadtree Encoding,”
Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-13: Springer Berlin / Heidelberg, pp. 40-48, September 2006.
(308.39 KB)
“Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,”
Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014.
DOI: http://dx.doi.org/10.14529/jsfi1401 (1.86 MB)
“Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,”
Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015.
DOI: doi:10.1016/j.jpdc.2015.06.007 (5.06 MB)
“Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,”
Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020.
DOI: 10.1098/rspa.2020.0110 (2.24 MB)
“Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,”
SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015.
DOI: DOI:10.1137/14M0973773 (374.8 KB)
“Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems,”
International Journal of High Performance Computer Applications (to appear), August 2007.
(157.4 KB)
“Middleware for the Use of Storage in Communication,”
Parallel Computing, vol. 28, no. 12, pp. 1773-1788, August 2002.
(87.97 KB)
“Message Passing Software Systems,”
Encyclopedia of Electrical and Engineering, Supplement 1: John Wiley & Sons, Inc., 00 2000.
(289.38 KB)
“Measuring Computer Performance: A Practioner's Guide,”
SIAM Review (book review), vol. 43, no. 2, pp. 383-384, 00 2001.
(558.9 KB)
“Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,”
Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020.
DOI: 10.1016/j.jpdc.2020.07.001 (1.3 MB)
“Matrices Over Runtime Systems at Exascale,”
Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.
“Materials fingerprinting classification,”
Computer Physics Communications, pp. 108019, May Jan.
DOI: 10.1016/j.cpc.2021.108019 (3.8 MB)
“The Marketplace for High-Performance Computers,”
Parallel Computing, vol. 25, no. 13-14, pp. 1517-1545, October 2002.
(285.78 KB)
“MaPHyS or the Development of a Parallel Algebraic Domain Decomposition Solver in the Course of the Solstice Project,”
Sparse Days 2010 Meeting at CERFACS, Toulouse, France, June 2010.
“MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,”
The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.
DOI: 10.1177/1094342020938421
“MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures,”
The International Journal of High Performance Computing Applications, June 2024.
DOI: 10.1177/10943420241261960
“LU Factorization with Partial Pivoting for a Multicore System with Accelerators,”
IEEE Transactions on Parallel and Distributed Computing, vol. 24, issue 8, pp. 1613-1621, August 2013.
DOI: http://doi.ieeecomputersociety.org/10.1109/TPDS.2012.242 (1.08 MB)
“