Publications

Conference Paper

Dong, T., A. Haidar, S. Tomov, and J. Dongarra, “A Fast Batched Cholesky Factorization on a GPU,” International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, September 2014.

(1.37 MB)

Abdelfattah, A., S. Tomov, and J. Dongarra, “Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs,” 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(675.5 KB)

Wang, L., W. Wu, J. Zhang, H. Liu, G. Bosilca, M. Herlihy, and R. Fonseca, “FFT-Based Gradient Sparsification for the Distributed Training of Deep Neural Networks,” 9th International Symposium on High-Performance Parallel and Distributed Computing (HPDC 20), Stockholm, Sweden, ACM, June 2020.

(4.72 MB)

Anzt, H., G. Collins, J. Dongarra, G. Flegar, and E. S. Quintana-Orti, “Flexible Batched Sparse Matrix-Vector Product on GPUs,” 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '17), Denver, CO, ACM Press, November 2017.

(583.4 KB)

Cao, Q., G. Bosilca, W. Wu, D. Zhong, A. Bouteiller, and J. Dongarra, “Flexible Data Redistribution in a Task-Based Runtime System,” IEEE International Conference on Cluster Computing (Cluster 2020), Kobe, Japan, IEEE, September 2020.

(354.8 KB)

Haidar, A., A. YarKhan, C. Cao, P. Luszczek, S. Tomov, and J. Dongarra, “Flexible Linear Algebra Development and Scheduling with Cholesky Factorization,” 17th IEEE International Conference on High Performance Computing and Communications, Newark, NJ, August 2015.

(494.31 KB)

Haidar, A., T. Dong, S. Tomov, P. Luszczek, and J. Dongarra, “Framework for Batched and GPU-resident Factorization Algorithms to Block Householder Transformations,” ISC High Performance, Frankfurt, Germany, Springer, July 2015.

(778.26 KB)

Cao, Q., R. Alomairy, Y. Pei, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, “A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), July 2022.

(1.03 MB)

Bosilca, G., A. Bouteiller, A. Danalis, T. Herault, and J. Dongarra, “From Serial Loops to Parallel Execution on Distributed Systems,” International European Conference on Parallel and Distributed Computing (Euro-Par '12), Rhodes, Greece, August 2012.

(203.08 KB)

Conference Proceedings

Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “Failure Detection and Propagation in HPC Systems,” Proceedings of the The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Salt Lake City, Utah, IEEE Press, pp. 27:1-27:11, November 2016.

Bouteiller, A., and F. Desprez, “Fault Tolerance Management for a Hierarchical GridRPC Middleware,” 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), Lyon, France, January 2008.

(319.79 KB)

Fagg, G., E. Gabriel, Z. Chen, T. Angskun, G. Bosilca, A. Bukovsky, and J. Dongarra, “Fault Tolerant Communication Library and Applications for High Performance Computing,” Los Alamos Computer Science Institute (LACSI) Symposium 2003 (presented), Santa Fe, NM, October 2003.

(146.05 KB)

Chen, Z., G. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. Dongarra, “Fault Tolerant High Performance Computing by a Coding Approach,” Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (to appear), Chicago, Illinois, January 2005.

(209.37 KB)

Fagg, G., A. Bukovsky, and J. Dongarra, “Fault Tolerant MPI for the HARNESS Meta-Computing System,” Proceedings of International Conference of Computational Science - ICCS 2001, Lecture Notes in Computer Science, vol. 2073, Berlin, Springer Verlag, pp. 355-366, 00 2001.

Gabriel, E., G. Fagg, A. Bukovsky, T. Angskun, and J. Dongarra, “A Fault-Tolerant Communication Library for Grid Environments,” 17th Annual ACM International Conference on Supercomputing (ICS'03) International Workshop on Grid Computing and e-Science, San Francisco, June 2003.

(377.14 KB)

Song, F., S. Moore, and J. Dongarra, “Feedback-Directed Thread Scheduling with Memory Considerations,” IEEE International Symposium on High Performance Distributed Computing, Monterey Bay, CA, June 2007.

(297.24 KB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, A. Haidar, T. Herault, J. Kurzak, J. Langou, P. Lemariner, H. Ltaeif, et al., “Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA,” Proceedings of the Workshops of the 25th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2011 Workshops), Anchorage, Alaska, USA, IEEE, pp. 1432-1441, May 2011.

(1.26 MB)

Tang, C., A. Bouteiller, T. Herault, M G. Venkata, and G. Bosilca, “From MPI to OpenSHMEM: Porting LAMMPS,” OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, Annapolis, MD, USA, Springer International Publishing, pp. 121–137, 2015.

Fagg, G., and J. Dongarra, “FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World,” Lecture Notes in Computer Science: Proceedings of EuroPVM-MPI 2000, (Hungary: Springer Verlag, 2000), pp. V1908,346-353, January 2000.

(51.95 KB)

Journal Article

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures,” Procedia Computer Science, vol. 108, pp. 606–615, June 2017.

(643.44 KB)

Bosilca, G., A. Bouteiller, A. Guermouche, T. Herault, Y. Robert, P. Sens, and J. Dongarra, “A Failure Detector for HPC Platforms,” The International Journal of High Performance Computing Applications, vol. 32, issue 1, pp. 139–158, January 2018.

(1.04 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Fast Cholesky Factorization on GPUs for Batch and Native Modes in MAGMA,” Journal of Computational Science, vol. 20, pp. 85–93, May 2017.

(3.6 MB)

Losada, N., P. González, M. J. Martín, G. Bosilca, A. Bouteiller, and K. Teranishi, “Fault Tolerance of MPI Applications in Exascale Systems: The ULFM Solution,” Future Generation Computer Systems, vol. 106, pp. 467-481, May 2020.

(2.06 MB)

Anzt, H., J. Dongarra, and E. S. Quintana-Orti, “Fine-grained Bit-Flip Protection for Relaxation Methods,” Journal of Computational Science, November 2016.

(1.47 MB)

Fagg, G., J. Pjesivac–Grbovic, G. Bosilca, T. Angskun, and J. Dongarra, “Flexible collective communication tuning architecture applied to Open MPI,” 2006 Euro PVM/MPI (submitted), Bonn, Germany, January 2006.

(206.58 KB)

Kabir, K., A. Haidar, S. Tomov, A. Bouteiller, and J. Dongarra, “A Framework for Out of Memory SVD Algorithms,” ISC High Performance 2017, pp. 158–178, June 2017.

(393.22 KB)

Du, P., R. Weber, P. Luszczek, S. Tomov, G. D. Peterson, and J. Dongarra, “From CUDA to OpenCL: Towards a Performance-portable Solution for Multi-platform GPU Programming,” Parallel Computing, vol. 38, no. 8, pp. 391-407, August 2012.

(1.64 MB)

Dewolfs, D., J. Broeckhove, V. Sunderam, and G. Fagg, “FT-MPI, Fault-Tolerant Metacomputing and Generic Name Services: A Case Study,” Lecture Notes in Computer Science, vol. 4192, no. ICL-UT-06-14: Springer Berlin / Heidelberg, pp. 133-140, 00 2006.

(362.44 KB)

, “The Future of Supercomputing: An Interim Report,” National Research Council, Washington, D.C., The National Academies Press, January 2003.

Poster

Tomov, S., A. Haidar, A. Ayala, D. Schultz, and J. Dongarra, FFT-ECP Fast Fourier Transform , Houston, TX, 2019 ECP Annual Meeting (Research Poster), January 2019.

(1.51 MB)

Presentation

Anzt, H., G. Collins, J. Dongarra, G. Flegar, and E. S. Quintana-Orti, Flexible Batched Sparse Matrix Vector Product on GPUs , Denver, Colorado, ScalA'17: 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November 2017.

(16.8 MB)

Tomov, S., and J. Dongarra, The Future of Computing: Software Libraries , Savannah, GA, DOD CREATE Developers' Review, Keynote Presentation, February 2012.

(6.76 MB)

Tech Report

Alvaro, W., J. Kurzak, and J. Dongarra, “Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the CELL Processor,” University of Tennessee Computer Science Technical Report, no. UT-CS-08-609, (also LAPACK Working Note 189), January 2008.

(500.99 KB)

Agullo, E., C. Augonnet, J. Dongarra, H. Ltaeif, R. Namyst, S. Thibault, and S. Tomov, “Faster, Cheaper, Better - A Hybridization Methodology to Develop Linear Algebra Software for GPUs,” LAPACK Working Note, no. 230, 00 2010.

(334.48 KB)

Dongarra, J., T. Herault, and Y. Robert, “Fault Tolerance Techniques for High-performance Computing,” University of Tennessee Computer Science Technical Report (also LAWN 289), no. UT-EECS-15-734: University of Tennessee, May 2015.

Ayala, A., S. Tomov, P. Luszczek, S. Cayrols, G. Ragghianti, and J. Dongarra, “FFT Benchmark Performance Experiments on Systems Targeting Exascale,” ICL Technical Report, no. ICL-UT-22-02, March 2022.

(5.87 MB)

Tomov, S., A. Ayala, A. Haidar, and J. Dongarra, “FFT-ECP API and High-Performance Library Prototype for 2-D and 3-D FFTs on Large-Scale Heterogeneous Systems with GPUs,” ECP Milestone Report, no. FFT-ECP STML13-27: Innovative Computing Laboratory, University of Tennessee, January 2020.

(9.71 MB)

Tomov, S., A. Haidar, A. Ayala, H. Shaiek, and J. Dongarra, “FFT-ECP Implementation Optimizations and Features Phase,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-12: University of Tennessee, October 2019.

(4.14 MB)

Dongarra, J., and V. Eijkhout, “Finite-choice Algorithm Optimization in Conjugate Gradients (LAPACK Working Note 159),” University of Tennessee Computer Science Technical Report, UT-CS-03-502, January 2003.

(64.52 KB)

Jagode, H., A. Danalis, and J. Dongarra, “Formulation of Requirements for New PAPI++ Software Package: Part I: Survey Results,” PAPI++ Working Notes, no. 1, ICL-UT-20-02: Innovative Computing Laboratory, University of Tennessee Knoxville, January 2020.

(1.49 MB)

Kurzak, J., and J. Dongarra, “Fully Dynamic Scheduler for Numerical Computing on Multicore Processors,” University of Tennessee Computer Science Department Technical Report, UT-CS-09-643 (Also LAPACK Working Note 220), 00 2009.

(488.24 KB)