Publications
Accelerating NWChem Coupled Cluster through dataflow-based Execution,”
The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 540--551, July 2018.
(1.68 MB)
“Accelerating NWChem Coupled Cluster through Dataflow-Based Execution,”
The International Journal of High Performance Computing Applications, pp. 1–13, January 2017.
(4.07 MB)
“Accelerating Restarted GMRES with Mixed Precision Arithmetic,”
IEEE Transactions on Parallel and Distributed Systems, June 2021.
(572.4 KB)
“Accelerating Scientific Computations with Mixed Precision Algorithms,”
Computer Physics Communications, vol. 180, issue 12, pp. 2526-2533, December 2009.
(402.69 KB)
“
“Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms through Hybrid GPU-Based Computing,”
Parallel Computing, vol. 36, no. 12, pp. 645-654, 00 2010.
(1.39 MB)
“Accelerating the SVD Bi-Diagonalization of a Batch of Small Matrices using GPUs,”
Journal of Computational Science, vol. 26, pp. 237–245, May 2018.
(2.18 MB)
“Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,”
Parallel Computing, vol. 74, pp. 3–18, May 2018.
(1.34 MB)
“Accelerating Time-To-Solution for Computational Science and Engineering,”
SciDAC Review, 00 2009.
(739.11 KB)
“Acceleration of GPU-based Krylov solvers via Data Transfer Reduction,”
International Journal of High Performance Computing Applications, 2015.
“Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting,”
Concurrency and Computation: Practice and Experience, vol. 26, issue 7, pp. 1408-1431, May 2014.
(1.96 MB)
“Active Netlib: An Active Mathematical Software Collection for Inquiry-based Computational Science and Engineering Education,”
Journal of Digital Information special issue on Interactivity in Digital Libraries, vol. 2, no. 4, 00 2002.
(182.59 KB)
“Adaptive Precision in Block-Jacobi Preconditioning for Iterative Sparse Linear System Solvers,”
Concurrency and Computation: Practice and Experience, vol. 31, no. 6, pp. e4460, March 2019.
(341.54 KB)
“Adaptive Scheduling for Task Farming with Grid Middleware,”
International Journal of Supercomputer Applications and High-Performance Computing, vol. 13, no. 3, pp. 231-240, October 2002.
(461.08 KB)
“Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy,”
ACM Transactions on Parallel Computing, vol. 1, issue 2, no. 10, pp. 10:1-10:28, January 2015.
(1.14 MB)
“Algorithm-Based Fault Tolerance for Fail-Stop Failures,”
IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, January 2008.
(340.49 KB)
“Algorithmic Based Fault Tolerance Applied to High Performance Computing,”
Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.
(313.55 KB)
“Algorithmic Issues on Heterogeneous Computing Platforms,”
Parallel Processing Letters, vol. 9, no. 2, pp. 197-213, January 1999.
(301.17 KB)
“Algorithmic Redistribution Methods for Block Cyclic Decompositions,”
IEEE Transactions on Parallel and Distributed Computing, vol. 10, no. 12, pp. 201-220, October 2002.
(524.82 KB)
“Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,”
Parallel Computing, vol. 81, pp. 1–21, January 2019.
(3.27 MB)
“Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs,”
IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 12, pp. 2700–2712, December 2018.
(2.53 MB)
“Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,”
Submitted to Concurrency and Computations: Practice and Experience, November 2010.
(1.65 MB)
“Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems,”
IEEE Cluster 2009, New Orleans, August 2009.
(395.53 KB)
“Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,”
Parallel Computing, vol. 52, pp. 22-41, February 2016.
(2.06 MB)
“An Asynchronous Algorithm on NetSolve Global Computing System,”
Future Generation Computer Systems, vol. 22, issue 3, pp. 279-290, February 2006.
(568.92 KB)
“Atlanta Organizers Put Mathematics to Work For the Math Sciences Community,”
SIAM News, vol. 32, no. 6, January 1999.
(45.98 KB)
“Automated Empirical Optimization of Software and the ATLAS Project,”
Parallel Computing, vol. 27, no. 1-2, pp. 3-25, January 2001.
(370.71 KB)
“Automatic Analysis of Inefficiency Patterns in Parallel Applications,”
Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.
(233.31 KB)
“Automatic analysis of inefficiency patterns in parallel applications,”
Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.
(233.31 KB)
“Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load,”
EuroPar 2002, Paderborn, Germany, August 2002.
(92.59 KB)
“Automatic Translation of Fortran to JVM Bytecode,”
Concurrency and Computation: Practice and Experience, vol. 15, no. 3-5, pp. 202-207, 00 2003.
(185.8 KB)
“Autotuning GEMM Kernels for the Fermi GPU,”
IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012.
(742.5 KB)
“Autotuning in High-Performance Computing Applications,”
Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018.
(2.5 MB)
“Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,”
Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018.
(2.53 MB)
“Autotuning Techniques for Performance-Portable Point Set Registration in 3D,”
Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.
(720.15 KB)
“Basic Linear Algebra Subprograms (BLAS),”
(an update), submitted to ACM TOMS, February 2001.
(228.33 KB)
“Batched matrix computations on hardware accelerators based on GPUs,”
International Journal of High Performance Computing Applications, February 2015.
(2.16 MB)
“Batched One-Sided Factorizations of Tiny Matrices Using GPUs: Challenges and Countermeasures,”
Journal of Computational Science, vol. 26, pp. 226–236, May 2018.
(3.73 MB)
“Biannual Top-500 Computer Lists Track Changing Environments for Scientific Computing,”
SIAM News, vol. 34, no. 9, October 2002.
(2.62 MB)
“Big Data and Extreme-Scale Computing: Pathways to Convergence - Toward a Shaping Strategy for a Future Software and Data Ecosystem for Scientific Inquiry,”
The International Journal of High Performance Computing Applications, vol. 32, issue 4, pp. 435–479, July 2018.
(1.29 MB)
“Biological Sequence Alignment on the Computational Grid Using the GrADS Framework,”
Future Generation Computing Systems, vol. 21, no. 6: Elsevier, pp. 980-986, June 2005.
(147.29 KB)
“BlackjackBench: Portable Hardware Characterization with Automated Results Analysis,”
The Computer Journal, March 2013.
(408.45 KB)
“Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems,”
ICCS 2012, Omaha, NE, June 2012.
(608.95 KB)
“Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems
, no. UT-CS-11-689, December 2011.
(608.95 KB)
A Block-Asynchronous Relaxation Method for Graphics Processing Units,”
Journal of Parallel and Distributed Computing, vol. 73, issue 12, pp. 1613–1626, December 2013.
(1.08 MB)
“Building and using a Fault Tolerant MPI implementation,”
International Journal of High Performance Applications and Supercomputing (to appear), 00 2004.
“Changes in Dense Linear Algebra Kernels - Decades Long Perspective,”
in Solving the Schrodinger Equation: Has everything been tried? (to appear): Imperial College Press, 00 2011.
“Checkpointing Strategies for Shared High-Performance Computing Platforms,”
International Journal of Networking and Computing, vol. 9, no. 1, pp. 28–52, 2019.
(490.5 KB)
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing (to appear), 00 2010.
(612.23 KB)
“A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures,”
Parallel Computing, vol. 35, pp. 38-53, 00 2009.
(274.74 KB)
“