Publications

Bosilca, G., R. Delmas, J. Dongarra, and J. Langou, “Algorithmic Based Fault Tolerance Applied to High Performance Computing,” University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.

(313.55 KB)

Dongarra, J., G. Bosilca, R. Delmas, and J. Langou, “Algorithmic Based Fault Tolerance Applied to High Performance Computing,” Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.

(313.55 KB)

Boulet, P., J. Dongarra, F. Rastello, Y. Robert, and F. Vivien, “Algorithmic Issues on Heterogeneous Computing Platforms,” Parallel Processing Letters, vol. 9, no. 2, pp. 197-213, January 1999.

(301.17 KB)

Petitet, A., and J. Dongarra, “Algorithmic Redistribution Methods for Block Cyclic Decompositions,” IEEE Transactions on Parallel and Distributed Computing, vol. 10, no. 12, pp. 201-220, October 2002.

(524.82 KB)

Donfack, S., J. Dongarra, M. Faverge, M. Gates, J. Kurzak, P. Luszczek, and I. Yamazaki, “On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.

(358.98 KB)

Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, “Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Innovative Computing Laboratory Technical Report, no. ICL-UT-18-09: Innovative Computing Laboratory, University of Tennessee, September 2018.

(3.74 MB)

Masliah, I., A. Abdelfattah, A. Haidar, S. Tomov, M. Baboulin, J. Falcou, and J. Dongarra, “Algorithms and Optimization Techniques for High-Performance Matrix-Matrix Multiplications of Very Small Matrices,” Parallel Computing, vol. 81, pp. 1–21, January 2019.

(3.27 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 12, pp. 2700–2712, December 2018.

(2.53 MB)

Haidar, A., H. Ltaeif, A. YarKhan, and J. Dongarra, “Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,” Submitted to Concurrency and Computations: Practice and Experience, November 2010.

(1.65 MB)

Haidar, A., H. Ltaeif, A. YarKhan, and J. Dongarra, “Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), March 2011.

(1.65 MB)

Ayala, A., S. Tomov, P. Luszczek, S. Cayrols, G. Ragghianti, and J. Dongarra, “Analysis of the Communication and Computation Cost of FFT Libraries towards Exascale,” ICL Technical Report, no. ICL-UT-22-07: Innovative Computing Laboratory, July 2022.

(5.91 MB)

Luszczek, P., and J. Dongarra, “Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,” Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.

(226.9 KB)

Song, F., S. Moore, and J. Dongarra, “Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems,” IEEE Cluster 2009, New Orleans, August 2009.

(395.53 KB)

Song, F., S. Moore, and J. Dongarra, “Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,” University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.

(650.75 KB)

Yamazaki, I., A. Abdelfattah, A. Ida, S. Ohshima, S. Tomov, R. Yokota, and J. Dongarra, “Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU Clusters,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, BC, Canada, IEEE, May 2018.

(1.37 MB)

Luszczek, P., and J. Dongarra, “Anatomy of a Globally Recursive Embedded LINPACK Benchmark,” 2012 IEEE High Performance Extreme Computing Conference, Waltham, MA, pp. 1-6, September 2012.

(204.74 KB)

Eidson, T., J. Dongarra, and V. Eijkhout, “Applying Aspect-Oriented Programming Concepts to a Component-based Programming Model,” IPDPS 2003, Workshop on NSF-Next Generation Software, Nice, France, March 2003.

(66.99 KB)

Hendrickson, B., P. Messina, B. Bland, J. Chen, P. Colella, E. Dart, J. Dongarra, T. Dunning, I. Foster, R. Gerber, et al., ASCR@40: Four Decades of Department of Energy Leadership in Advanced Scientific Computing Research : Advanced Scientific Computing Advisory Committee (ASCAC), US Department of Energy, August 2020.

Hendrickson, B., P. Messina, B. Bland, J. Chen, P. Colella, E. Dart, J. Dongarra, T. Dunning, I. Foster, R. Gerber, et al., ASCR@40: Highlights and Impacts of ASCR’s Programs : US Department of Energy’s Office of Advanced Scientific Computing Research, June 2020.

Herrmann, J., G. Bosilca, T. Herault, L. Marchal, Y. Robert, and J. Dongarra, “Assessing the Cost of Redistribution followed by a Computational Kernel: Complexity and Performance Results,” Parallel Computing, vol. 52, pp. 22-41, February 2016.

(2.06 MB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.

(968.47 KB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Assessing the Impact of ABFT and Checkpoint Composite Strategies,” 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.02 MB)

Emad, N., S. A. S. Fazeli, and J. Dongarra, “An Asynchronous Algorithm on NetSolve Global Computing System,” PRiSM - Laboratoire de recherche en informatique, Université de Versailles St-Quentin Technical Report, March 2004.

(377.33 KB)

Dongarra, J., N. Emad, and S. Abolfazl Shahzadeh-Fazeli, “An Asynchronous Algorithm on NetSolve Global Computing System,” Future Generation Computer Systems, vol. 22, issue 3, pp. 279-290, February 2006.

(568.92 KB)

Chow, E., H. Anzt, and J. Dongarra, “Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs,” International Supercomputing Conference (ISC 2015), Frankfurt, Germany, July 2015.

Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.

(188.51 KB)

Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.

(188.51 KB)

Berry, M., and J. Dongarra, “Atlanta Organizers Put Mathematics to Work For the Math Sciences Community,” SIAM News, vol. 32, no. 6, January 1999.

(45.98 KB)

Seymour, K., H. You, and J. Dongarra, “ATLAS on the BlueGene/L – Preliminary Results,” ICL Technical Report, no. ICL-UT-06-10, January 2006.

(46.19 KB)

Whaley, C., A. Petitet, and J. Dongarra, “Automated Empirical Optimization of Software and the ATLAS Project,” Parallel Computing, vol. 27, no. 1-2, pp. 3-25, January 2001.

(370.71 KB)

Whaley, C., A. Petitet, and J. Dongarra, “Automated Empirical Optimizations of Software and the ATLAS Project (LAPACK Working Note 147),” University of Tennessee Computer Science Department Technical Report,, no. UT-CS-00-448, September 2000.

(373.69 KB)

You, H., K. Seymour, J. Dongarra, and S. Moore, “Automated Empirical Tuning of a Multiresolution Analysis Kernel,” ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.

(120.7 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic analysis of inefficiency patterns in parallel applications,” Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.

(233.31 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic Analysis of Inefficiency Patterns in Parallel Applications,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.

(233.31 KB)

Yi, Q., K. Kennedy, H. You, K. Seymour, and J. Dongarra, “Automatic Blocking of QR and LU Factorizations for Locality,” 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004), Washington, DC, ACM, June 2004.

(212.77 KB)

Bhatia, N., F. Song, F. Wolf, J. Dongarra, B. Mohr, and S. Moore, “Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,” In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.

(227.13 KB)

Cuenca, J., D. Giminez, J. González, J. Dongarra, and K. Roche, “Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load,” EuroPar 2002, Paderborn, Germany, August 2002.

(92.59 KB)

Seymour, K., and J. Dongarra, “Automatic Translation of Fortran to JVM Bytecode,” Joint ACM Java Grande - ISCOPE 2001 Conference (submitted), Stanford University, California, June 2001.

(185.8 KB)

Seymour, K., and J. Dongarra, “Automatic Translation of Fortran to JVM Bytecode,” Concurrency and Computation: Practice and Experience, vol. 15, no. 3-5, pp. 202-207, 00 2003.

(185.8 KB)

Vadhiyar, S., G. Fagg, and J. Dongarra, “Automatically Tuned Collective Communications,” Proceedings of SuperComputing 2000 (SC'2000), Dallas, TX, November 2000.

(232.69 KB)

Whaley, C., and J. Dongarra, “Automatically Tuned Linear Algebra Software,” 1998 ACM/IEEE conference on Supercomputing (SC '98), Orlando, FL, IEEE Computer Society, November 1998.

Mucci, P., J. Dongarra, R. Kufrin, S. Moore, F. Song, and F. Wolf, “Automating the Large-Scale Collection and Analysis of Performance,” 5th LCI International Conference on Linux Clusters: The HPC Revolution, Austin, Texas, May 2004.

(511.6 KB)

Gates, M., J. Kurzak, P. Luszczek, Y. Pei, and J. Dongarra, “Autotuning Batch Cholesky Factorization in CUDA with Interleaved Layout of Matrices,” Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, IEEE, June 2017.

Nath, R., S. Tomov, E. Agullo, and J. Dongarra, Autotuning Dense Linear Algebra Libraries on GPUs , Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.

(579.44 KB)

Kurzak, J., S. Tomov, and J. Dongarra, “Autotuning GEMM Kernels for the Fermi GPU,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012.

(742.5 KB)

Kurzak, J., S. Tomov, and J. Dongarra, “Autotuning GEMMs for Fermi,” University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.

(397.45 KB)

Balaprakash, P., J. Dongarra, T. Gamblin, M. Hall, J. Hollingsworth, B. Norris, and R. Vuduc, “Autotuning in High-Performance Computing Applications,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018.

(2.5 MB)

Dongarra, J., M. Gates, J. Kurzak, P. Luszczek, and Y. Tsai, “Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2040–2055, November 2018.

(2.53 MB)

Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, V. Maroulas, and J. Dongarra, “Autotuning Techniques for Performance-Portable Point Set Registration in 3D,” Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.

(720.15 KB)