Publications
The Impact of Multicore on Computational Science Software,”
CTWatch Quarterly, vol. 3, issue 1, February 2007.
“The Impact of Multicore on Math Software,”
PARA 2006, Umea, Sweden, June 2006.
(223.53 KB)
“The Impact of Paravirtualized Memory Hierarchy on Linear Algebra Computational Kernels and Software,”
ACM/IEEE International Symposium on High Performance Distributed Computing, Boston, MA., June 2008.
(403.89 KB)
“Impact of Quad-core Cray XT4 System and Software Stack on Scientific Computation,”
Euro-Par 2009, Lecture Notes in Computer Science, vol. 5704/2009, Delft, The Netherlands, Springer Berlin / Heidelberg, pp. 334-344, August 2009.
(312.74 KB)
“Impacts of Multi-GPU MPI Collective Communications on Large FFT Computation,”
Workshop on Exascale MPI (ExaMPI) at SC19, Denver, CO, November 2019.
(1.6 MB)
“Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs,”
IEEE Transactions on Parallel and Distributed Systems, no. 1045-9219, November 2015.
“Implementation and Usage of the PERUSE-Interface in Open MPI,”
Euro PVM/MPI 2006, Bonn, Germany, September 2006.
(310.76 KB)
“Implementation of Mixed Precision in Solving Systems of Linear Equations on the Cell Processor,”
Concurrency and Computation: Practice and Experience, vol. 19, no. 10, pp. 1371-1385, July 2007.
(453.78 KB)
“Implementation of the C++ API for Batch BLAS,”
SLATE Working Notes, no. 07, ICL-UT-18-04: Innovative Computing Laboratory, University of Tennessee, June 2018.
(1.07 MB)
“Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor,”
University of Tennessee Computer Science Tech Report, no. UT-CS-06-580, LAPACK Working Note #177, September 2006.
(506.18 KB)
“An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs,”
Applied Parallel and Scientific Computing, vol. 7133, pp. 248-257, 00 2012.
(623.5 KB)
“Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures,”
IPDPS 2013 (submitted), Boston, MA, 00 2013.
(1.22 MB)
“Implementing a Sparse Matrix Vector Product for the SELL-C/SELL-C-σ formats on NVIDIA GPUs,”
University of Tennessee Computer Science Technical Report, no. UT-EECS-14-727: University of Tennessee, April 2014.
(578.11 KB)
“Implementing a systolic algorithm for QR factorization on multicore clusters with PaRSEC,”
Lawn 277, no. UT-CS-13-709, May 2013.
(298.63 KB)
“Implementing Linear Algebra Routines on Multi-Core Processors with Pipelining and a Look Ahead,”
University of Tennessee Computer Science Tech Report, UT-CS-06-581, LAPACK Working Note #178, January 2006.
(304.4 KB)
“Implicit Actions and Non-blocking Failure Recovery with MPI,”
2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), Dallas, TX, USA, IEEE, January 2023, 2022.
DOI: 10.1109/FTXS56515.2022.00009
“Improved Energy-Aware Strategies for Periodic Real-Time Tasks under Reliability Constraints,”
40th IEEE Real-Time Systems Symposium (RTSS 2019), York, UK, IEEE Press, February 2020.
“An Improved MAGMA GEMM for Fermi GPUs,”
University of Tennessee Computer Science Technical Report, no. UT-CS-10-655 (also LAPACK working note 227), July 2010.
(486.71 KB)
“An Improved MAGMA GEMM for Fermi GPUs,”
International Journal of High Performance Computing, vol. 24, no. 4, pp. 511-515, 00 2010.
“An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,”
Supercomputing 2013, Denver, CO, November 2013.
“An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,”
University of Tennessee Computer Science Technical Report (also LAWN 283), no. ut-eecs-13-720: University of Tennessee, October 2013.
(1.23 MB)
“Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Server,”
Parallel Processing Letters, vol. 17, no. 1, pp. 47-59, March 2006.
(718.4 KB)
“Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Servers Middleware,”
Parallel Processing Letters, vol. 17, no. 1, pp. 47-59, March 2007.
(718.4 KB)
“Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI,”
Proceedings of International Conference on Computational Science, ICCS 2010 (to appear), Amsterdam The Netherlands, Elsevier, June 2010.
(125.01 KB)
“Improvements in the Efficient Composition of Applications,”
IPDPS 2004, NGS Workshop (to appear), Sante Fe, 00 2004.
(42.85 KB)
“Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives,”
Proceedings of The 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017), Best Paper Award, Orlando, FL, June 2017.
DOI: 10.1109/IPDPSW.2017.65 (453.66 KB)
“Improving the Energy Efficiency of Sparse Linear System Solvers on Multicore and Manycore Systems,”
Philosophical Transactions of the Royal Society A -- Mathematical, Physical and Engineering Sciences, vol. 372, issue 2018, July 2014.
DOI: 10.1098/rsta.2013.0279 (779.57 KB)
“Improving the performance of CA-GMRES on multicores with multiple GPUs,”
IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
(333.82 KB)
“Improving the Performance of the GMRES Method using Mixed-Precision Techniques,”
Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.
(600.33 KB)
“Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine,”
52nd International Conference on Parallel Processing (ICPP 2023), Salt Lake City, Utah, ACM, September 2023.
DOI: 10.1145/3605573.3605642
“Improving Time to Solution with Automated Performance Analysis,”
Second Workshop on Productivity and Performance in High-End Computing (P-PHEC) at 11th International Symposium on High Performance Computer Architecture (HPCA-2005), San Francisco, February 2005.
(112.63 KB)
“Incomplete Sparse Approximate Inverses for Parallel Preconditioning,”
Parallel Computing, vol. 71, pp. 1–22, January 2018.
DOI: 10.1016/j.parco.2017.10.003 (1.24 MB)
“Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators,”
IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019.
(470.21 KB)
“Initial Integration and Evaluation of SLATE and STRUMPACK,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-11: University of Tennessee, December 2018.
(249.78 KB)
“Initial Integration and Evaluation of SLATE Parallel BLAS in LATTE,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-18-07: Innovative Computing Laboratory, University of Tennessee, June 2018.
(366.6 KB)
“Innovations of the NetSolve Grid Computing System,”
Concurrency: Practice and Experience, vol. 14, no. 13-15, pp. 1457-1479, January 2002.
(311.31 KB)
“Integrating Deep Learning in Domain Science at Exascale (MagmaDNN)
, virtual, DOD HPCMP seminar, December 2020.
(11.12 MB)
Integrating Deep Learning in Domain Sciences at Exascale,”
2020 Smoky Mountains Computational Sciences and Engineering Conference (SMC 2020), August 2020.
“Integrating Deep Learning in Domain Sciences at Exascale,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-20-10: University of Tennessee, August 2020.
(1.09 MB)
“Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach,”
2022 IEEE International Conference on Cluster Computing (CLUSTER 2022), Heidelberg, Germany, September 2022.
“Intelligent Service Trading and Brokering for Distributed Network Services in GridSolve,”
VECPAR 2010, 9th International Meeting on High Performance Computing for Computational Science, Berkeley, CA, June 2010.
(256.04 KB)
“Interactive Grid-Access Using Gridsolve and Giggle,”
Computing and Informatics, vol. 27, no. 2, pp. 233-248,ISSN1335-9150, 00 2008.
(533.4 KB)
“Interim Report on Benchmarking FFT Libraries on High Performance Systems,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-21-03: University of Tennessee, July 2021.
(2.68 MB)
“Interior State Computation of Nano Structures,”
PARA 2008, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May 2008.
(137.12 KB)
“The International Exascale Software Project: A Call to Cooperative Action by the Global High Performance Community,”
International Journal of High Performance Computing Applications (to appear), July 2009.
(203.04 KB)
“The International Exascale Software Project Roadmap,”
International Journal of High Performance Computing, vol. 25, no. 1, pp. 3-60, January 2011.
DOI: 10.1177/1094342010391989 (719.74 KB)
“International Exascale Software Project Roadmap v1.0,”
University of Tennessee Computer Science Technical Report, UT-CS-10-654, May 2010.
(719.74 KB)
“An international survey on MPI users,”
Parallel Computing, vol. 108, December 2021.
DOI: 10.1016/j.parco.2021.102853 (1.49 MB)
“The Internet BackPlane Protocol: A Study in Resource Sharing,”
Proceedings of the second IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID 2002), Berlin, Germany, October 2002.
“Internet Backplane Protocol: API 1.0,”
University of Tennessee Computer Science Technical Report, no. UT-CS-01-464, January 2001.
(55.33 KB)
“