Publications

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.

(608.44 KB)

Haugen, B., “Performance Analysis and Modeling of Task-Based Runtimes,” Department of Electrical Engineering and Computer Science, vol. PhD, Knoxville, University of Tennessee, May 2016.

(5.14 MB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.

(1.12 MB)

Worley, P. H., J. Candy, L. Carrington, K. Huck, T. Kaiser, K. Mahinthakumar, A. D. Malony, S. Moore, D. Reed, P. C. Roth, et al., “Performance Analysis of GYRO: A Tool Evaluation,” In Proceedings of the 2005 SciDAC Conference, San Francisco, CA, June 2005.

(172.07 KB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” 4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05), Denver, Colorado, April 2005.

(1018.28 KB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” Cluster Computing Journal (to appear), January 2005.

(1018.28 KB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” Cluster computing, vol. 10, no. 2: Springer Netherlands, pp. 127-143, June 2007.

(1018.28 KB)

Mohr, B., A. Kühnal, M-A. Hermanns, and F. Wolf, “Performance Analysis of One-sided Communication Mechanisms,” Mini-Symposium "Tools Support for Parallel Programming", Proceedings of Parallel Computing (ParCo), no. ICL-UT-06-07, Malaga, Spain, September 2005.

(121.49 KB)

Ayala, A., S. Tomov, M. Stoyanov, A. Haidar, and J. Dongarra, “Performance Analysis of Parallel FFT on Large Multi-GPU Systems,” 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, IEEE, August 2022.

Marin, G., “Performance Analysis of the MPAS-Ocean Code using HPCToolkit and MIAMI,” ICL Technical Report, no. ICL-UT-14-01: University of Tennessee, February 2014.

(894.39 KB)

Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, “Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools,” Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019.

(429.55 KB)

Anzt, H., S. Tomov, and J. Dongarra, “On the performance and energy efficiency of sparse linear algebra on GPUs,” International Journal of High Performance Computing Applications, October 2016.

(1.19 MB)

Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, “Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014.

(407.5 KB)

Dongarra, J., T. Herault, and Y. Robert, “Performance and Reliability Trade-offs for the Double Checkpointing Algorithm,” International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41.

(859.04 KB)

Danalis, A., and H. Jagode, “Performance Application Programming Interface,” Accelerated Computing with HIP: Sun, Baruah and Kaeli, December 2022.

Dongarra, J., H. Jagode, A. Danalis, D. Barry, and V. Weaver, Performance Application Programming Interface for Extreme-Scale Environments (PAPI-EX) (Poster) , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, 20 2020.

(2.53 MB)

McCraw, H., “Performance Counter Monitoring for the Blue Gene/Q Architecture,” University of Tennessee Computer Science Technical Report, no. ICL-UT-12-01, 00 2012.

(92.5 KB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-739: University of Tennessee, February 2016.

(1.27 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.

(1.27 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016.

(1.98 MB)

Tomov, S., W. Lu, J. Bernholc, S. Moore, and J. Dongarra, “Performance evaluation for petascale quantum simulation tools,” Proceedings of CUG09, Atlanta, GA, May 2009.

(1.09 MB)

Tomov, S., W. Lu, J. Bernholc, S. Moore, and J. Dongarra, “Performance Evaluation for Petascale Quantum Simulation Tools,” Proceedings of the Cray Users' Group Meeting, Atlanta, GA, May 2010.

Canning, A., J. Dongarra, J. Langou, O. Marques, S. Tomov, C. Voemel, and L-W. Wang, “Performance evaluation of eigensolvers in nano-structure computations,” IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.

(120.61 KB)

Donfack, S., S. Tomov, and J. Dongarra, “Performance evaluation of LU factorization through hardware counter measurements,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-700, October 2012.

(794.82 KB)

Mishler, D., J. Ciesko, S. Olivier, and G. Bosilca, “Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces,” 2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Santa Fe, NM, USA, IEEE, November 2023.

Hernandez, O., F. Song, B. Chapman, J. Dongarra, B. Mohr, S. Moore, and F. Wolf, “Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,” Second International Workshop on OpenMP, Reims, France, January 2006.

(350.9 KB)

Hernandez, O., F. Song, B. Chapman, J. Dongarra, B. Mohr, S. Moore, and F. Wolf, “Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,” Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming, vol. 4315: Springer Berlin / Heidelberg, 00 2008.

(350.9 KB)

Dongarra, J., A. D. Malony, S. Moore, P. Mucci, and S. Shende, “Performance Instrumentation and Measurement for Terascale Systems,” ICCS 2003 Terascale Workshop, Melbourne, Australia, Springer, Berlin, Heidelberg, June 2003.

(5.36 MB)

Benoit, A., S. Perarnau, L. Pottier, and Y. Robert, “A Performance Model to Execute Workflows on High-Bandwidth Memory Architectures,” The 47th International Conference on Parallel Processing (ICPP 2018), Eugene, OR, IEEE Computer Society Press, August 2018.

(868.44 KB)

Vadhiyar, S., G. Fagg, and J. Dongarra, “Performance Modeling for Self Adapting Collective Communications for MPI,” LACSI Symposium 2001, Santa Fe, NM, October 2001.

(105.49 KB)

Yamazaki, I., E. Chow, A. Bouteiller, and J. Dongarra, “Performance of Asynchronous Optimized Schwarz with One-sided Communication,” Parallel Computing, vol. 86, pp. 66-81, August 2019.

(3.09 MB)

Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, no. CS-89-85, January 2001.

(6.42 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Department Technical Report, no. CS-89-85, January 2000.

(354.1 KB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Department Technical Report, CS-89-85, January 2004.

(6.42 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Department Technical Report, UT-CS-04-526, vol. –89-95, January 2006.

(6.42 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Dept. Technical Report CS-89-85, 00 2007.

(6.42 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, CS-89-85, January 2008.

(6.42 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, UT-CS-89-85, 00 2010.

(6.42 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, no. CS-89-85, 00 2011.

(6.42 MB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software,” University of Tennessee Computer Science Technical Report, no. cs-89-85, February 2013.

(539.24 KB)

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software, (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, no. CS-89-85: University of Tennessee, June 2014.

(514.64 KB)

Buttari, A., V. Eijkhout, J. Langou, and S. Filippone, “Performance Optimization and Modeling of Blocked Sparse Kernels,” ICL Technical Report, no. ICL-UT-04-05, 00 2004.

(229.58 KB)

Abdelfattah, A., H. Ltaeif, D. Keyes, and J. Dongarra, “Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs,” Concurrency and Computation: Practice and Experience, vol. 28, issue 12, pp. 3447 - 3465, May 2016.

(3.21 MB)

Vadhiyar, S., “A Performance Oriented Migration Framework for the Grid,” Proceedings of the 3rd International Symposium on Cluster Computing and the Grid, Tokyo, Japan, pp. 130-137, May 2003.

(113.6 KB)

Bosilca, G., A. Bouteiller, T. Herault, P. Lemariner, N. Ohm Saengpatsa, S. Tomov, and J. Dongarra, “Performance Portability of a GPU Enabled Factorization with the DAGuE Framework,” IEEE Cluster: workshop on Parallel Programming on Accelerator Clusters (PPAC), June 2011.

(290.98 KB)

Moore, S., D. Cronk, F. Wolf, A. Purkayastha, P. J. Teller, R. Araiza, G. Aguilera, and J. Nava, “Performance Profiling and Analysis of DoD Applications using PAPI and TAU,” Proceedings of DoD HPCMP UGC 2005, Nashville, TN, IEEE, June 2005.

(322.56 KB)

Shende, S., A. D. Malony, A. Morris, and F. Wolf, “Performance Profiling Overhead Compensation for MPI Programs,” In Proc. of the 12th European Parallel Virtual Machine and Message Passing Interface Conference: Springer LNCS, September 2005.

(220.26 KB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs,” International Conference on Computational Science (ICCS'16), San Diego, CA, June 2016.

(626.21 KB)

Gates, M., A. Charara, A. YarKhan, D. Sukkari, M. Al Farhan, and J. Dongarra, “Performance Tuning SLATE,” SLATE Working Notes, no. 14, ICL-UT-20-01: Innovative Computing Laboratory, University of Tennessee, January 2020.

(1.29 MB)

Main menu

Pages