Publications
SLATE Users' Guide,”
SLATE Working Notes, no. 10, ICL-UT-19-01: Innovative Computing Laboratory, University of Tennessee, July 2020.
(1.51 MB)
“Clover: Computational Libraries Optimized via Exascale Research
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(872 KB)
SLATE: Software for Linear Algebra Targeting Exascale (POSTER)
, Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.
(546.56 KB)
SLATE Working Note 13: Implementing Singular Value and Symmetric/Hermitian Eigenvalue Solvers,”
SLATE Working Notes, no. 13, ICL-UT-19-07: Innovative Computing Laboratory, University of Tennessee, September 2019.
(3.47 MB)
“Accelerating Collaborative Filtering for Implicit Feedback Datasets using GPUs,”
2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, IEEE, November 2015.
(1.02 MB)
“SLATE: Design of a Modern Distributed and Accelerated Linear Algebra Library,”
International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Denver, CO, ACM, November 2019.
DOI: 10.1145/3295500.3356223 (2.01 MB)
“Accelerating Eigenvector Computation in the Nonsymmetric Eigenvalue Problem,”
VECPAR 2014, Eugene, OR, June 2014.
(199.44 KB)
“Accelerating the SVD Two Stage Bidiagonal Reduction and Divide and Conquer Using GPUs,”
Parallel Computing, vol. 74, pp. 3–18, May 2018.
DOI: 10.1016/j.parco.2017.10.004 (1.34 MB)
“Performance Tuning SLATE,”
SLATE Working Notes, no. 14, ICL-UT-20-01: Innovative Computing Laboratory, University of Tennessee, January 2020.
(1.29 MB)
“Evolution of the SLATE linear algebra library,”
The International Journal of High Performance Computing Applications, September 2024.
DOI: 10.1177/10943420241286531
“SLATE Tutorial
, Houston, TX, 2020 ECP Annual Meeting, February 2020.
(12.14 MB)
Least Squares Performance Report,”
SLATE Working Notes, no. 09, ICL-UT-18-10: Innovative Computing Laboratory, University of Tennessee, December 2018.
(1.76 MB)
“Evaluating Task Dropping Strategies for Overloaded Real-Time Systems (Work-In-Progress),”
42nd Real Time Systems Symposium (RTSS): IEEE Computer Society Press, 2021.
(217.13 KB)
“Scheduling Independent Stochastic Tasks on Heterogeneous Cloud Platforms,”
IEEE Cluster 2019, Albuquerque, New Mexico, IEEE Computer Society Press, September 2019.
(651 KB)
“BDEC2 Platform White Paper,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-19-11: University of Tennessee, September 2019.
(30.16 KB)
“Reservation and Checkpointing Strategies for Stochastic Jobs,”
34th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2020), New Orleans, LA, IEEE Computer Society Press, May 2020.
(692.4 KB)
“Evaluating The Performance Of MPI-2 Dynamic Communicators And One-Sided Communication,”
Lecture Notes in Computer Science, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 10th European PVM/MPI User's Group Meeting, vol. 2840, Venice, Italy, Springer-Verlag, Berlin, pp. 88-97, September 2003.
(254.08 KB)
“A Fault-Tolerant Communication Library for Grid Environments,”
17th Annual ACM International Conference on Supercomputing (ICS'03) International Workshop on Grid Computing and e-Science, San Francisco, June 2003.
(377.14 KB)
“Visualizing the Program Execution Control Flow of OpenMP Applications,”
Proc. 4th International Workshop on OpenMP (IWOMP 2008), West Lafayette, Indiana, Lecture Notes in Computer Science 5004, pp. 181-190, January 2008.
(194.25 KB)
“Detection and Analysis of Iterative Behavior in Parallel Applications,”
Proceedings of the 2008 International Conference on Computational Science (ICCS 2008), vol. 5103, Krakow, Poland, pp. 261-267, January 2008.
(141.02 KB)
“OpenMP-centric Performance Analysis of Hybrid Applications,”
Proc. 2008 IEEE International Conference on Cluster Computing (CLUSTER 2008), Tsukuba, Japan, January 2008.
(218.63 KB)
“Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors,”
Proceedings of the 2007 International Conference on Computational Science (ICCS 2007), vol. 4487-4490, Beijing, China, Springer LNCS, pp. 815-822, 2007.
DOI: 10.1007/978-3-540-72586-2_115 (145.84 KB)
“Continuous Runtime Profiling of OpenMP Applications,”
Proceedings of the 2007 Conference on Parallel Computing (PARCO 2007), Juelich and Aachen, Germany, January 2007.
(408.01 KB)
“Capturing and Analyzing the Execution Control Flow of OpenMP Applications,”
International Journal of Parallel Programming, vol. 37, no. 3, pp. 266-276, 00 2009.
“On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications,”
Proceedings of the 13th International Euro-Par Conference on Parallel Processing (Euro-Par '07), Rennes, France, Springer LNCS, January 2007.
“Recording the Control Flow of Parallel Applications to Determine Iterative and Phase-Based Behavior,”
Future Generation Computing Systems, vol. 26, pp. 162-166, 00 2009.
“Extending MAGMA Portability with OneAPI,”
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), Ninth Workshop on Accelerator Programming Using Directives (WACCPD 2022), Dallas, TX, November 2022.
(999.19 KB)
“Extending MAGMA Portability with OneAPI
, Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), ACM Student Research Competition, November 2022.
(1.33 MB)
Experiences with Windows 95/NT as a Cluster Computing Platform for Parallel Computing,”
Parallel and Distributed Computing Practices, Special Issue: Cluster Computing, vol. 2, no. 2: Nova Science Publishers, USA, pp. 119-128, February 1999.
(164.04 KB)
“The Case for Directive Programming for Accelerator Autotuner Optimization,”
Innovative Computing Laboratory Technical Report, no. ICL-UT-17-07: University of Tennessee, October 2017.
(341.52 KB)
“Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,”
Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015.
DOI: doi:10.1016/j.jpdc.2015.06.007 (5.06 MB)
“Designing LU-QR hybrid solvers for performance and stability,”
University of Tennessee Computer Science Technical Report (also LAWN 282), no. ut-eecs-13-719: University of Tennessee, October 2013.
(4.11 MB)
“Designing LU-QR Hybrid Solvers for Performance and Stability,”
IPDPS 2014, Phoenix, AZ, IEEE, May 2014.
DOI: 10.1109/IPDPS.2014.108 (4.2 MB)
“Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation,”
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, IEEE, May 2017.
DOI: 10.1109/IPDPS.2017.46 (328.15 KB)
“MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,”
The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.
DOI: 10.1177/1094342020938421
“Resilience for Stencil Computations with Latent Errors,”
International Conference on Parallel Processing (ICPP), Bristol, UK, IEEE Computer Society Press, August 2017.
(1.19 MB)
“Flexible collective communication tuning architecture applied to Open MPI,”
2006 Euro PVM/MPI (submitted), Bonn, Germany, January 2006.
(206.58 KB)
“HARNESS and Fault Tolerant MPI,”
Parallel Computing, vol. 27, no. 11, pp. 1479-1496, January 2001.
(164.2 KB)
“Fault Tolerant MPI for the HARNESS Meta-Computing System,”
Proceedings of International Conference of Computational Science - ICCS 2001, Lecture Notes in Computer Science, vol. 2073, Berlin, Springer Verlag, pp. 355-366, 00 2001.
DOI: 10.1007/3-540-45545-0_44
“Building and using a Fault Tolerant MPI implementation,”
International Journal of High Performance Applications and Supercomputing (to appear), 00 2004.
“FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World,”
Lecture Notes in Computer Science: Proceedings of EuroPVM-MPI 2000, (Hungary: Springer Verlag, 2000), pp. V1908,346-353, January 2000.
(51.95 KB)
“HARNESS Fault Tolerant MPI Design, Usage and Performance Issues,”
Future Generation Computer Systems, vol. 18, no. 8, pp. 1127-1142, January 2002.
(403.41 KB)
“Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems,”
Proceedings of ISC2004 (to appear), Heidelberg, Germany, June 2004.
(548.38 KB)
“Fault Tolerant Communication Library and Applications for High Performance Computing,”
Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented), Santa Fe, NM, October 2003.
(146.05 KB)
“Parallel IO Support for Meta-Computing Applications: MPI_Connect IO Applied to PACX-MPI,”
8th European PVM/MPI User's Group Meeting, Lecture Notes in Computer Science, vol. 2131, Greece, Springer Verlag, Berlin, September 2001.
(129.3 KB)
“Scalable Networked Information Processing Environment (SNIPE),”
Journal on Future Generation Computer Systems, vol. 15, no. 5/6, pp. 595-605, January 1999.
(189.21 KB)
“Scalable Fault Tolerant MPI: Extending the Recovery Algorithm,”
Proceedings of 12th European Parallel Virtual Machine and Message Passing Interface Conference - Euro PVM/MPI, vol. 3666, Sorrento (Naples) , Italy, Springer-Verlag Berlin, pp. 67, September 2005.
(144.86 KB)
“Process Fault-Tolerance: Semantics, Design and Applications for High Performance Computing,”
International Journal for High Performance Applications and Supercomputing (to appear), April 2004.
(186.9 KB)
“An Asynchronous Algorithm on NetSolve Global Computing System,”
PRiSM - Laboratoire de recherche en informatique, Université de Versailles St-Quentin Technical Report, March 2004.
(377.33 KB)
“IBP - Internet Backplane Protocol: Infrastructure for Distributed Storage (V O.2),”
University of Tennessee Computer Science Department Technical Report, no. UT-CS-99-430, February 1999.
(37.72 KB)
“