Publications

Dongarra, J., “Performance of Various Computers Using Standard Linear Equations Software (Linpack Benchmark Report),” University of Tennessee Computer Science Technical Report, UT-CS-89-85, 00 2010.

(6.42 MB)

Mary, T., I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra, “Performance of Random Sampling for Computing Low-rank Approximations of a Dense Matrix on GPUs,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, ACM, November 2015.

Yamazaki, I., E. Chow, A. Bouteiller, and J. Dongarra, “Performance of Asynchronous Optimized Schwarz with One-sided Communication,” Parallel Computing, vol. 86, pp. 66-81, August 2019.

(3.09 MB)

Vadhiyar, S., G. Fagg, and J. Dongarra, “Performance Modeling for Self Adapting Collective Communications for MPI,” LACSI Symposium 2001, Santa Fe, NM, October 2001.

(105.49 KB)

Dongarra, J., A. D. Malony, S. Moore, P. Mucci, and S. Shende, “Performance Instrumentation and Measurement for Terascale Systems,” ICCS 2003 Terascale Workshop, Melbourne, Australia, Springer, Berlin, Heidelberg, June 2003.

(5.36 MB)

Hernandez, O., F. Song, B. Chapman, J. Dongarra, B. Mohr, S. Moore, and F. Wolf, “Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,” Second International Workshop on OpenMP, Reims, France, January 2006.

(350.9 KB)

Hernandez, O., F. Song, B. Chapman, J. Dongarra, B. Mohr, S. Moore, and F. Wolf, “Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications,” Lecture Notes in Computer Science, OpenMP Shared Memory Parallel Programming, vol. 4315: Springer Berlin / Heidelberg, 00 2008.

(350.9 KB)

Donfack, S., S. Tomov, and J. Dongarra, “Performance evaluation of LU factorization through hardware counter measurements,” University of Tennessee Computer Science Technical Report, no. ut-cs-12-700, October 2012.

(794.82 KB)

Canning, A., J. Dongarra, J. Langou, O. Marques, S. Tomov, C. Voemel, and L-W. Wang, “Performance evaluation of eigensolvers in nano-structure computations,” IEEE/ACM Proceedings of HPCNano SC06 (to appear), January 2006.

(120.61 KB)

Tomov, S., W. Lu, J. Bernholc, S. Moore, and J. Dongarra, “Performance Evaluation for Petascale Quantum Simulation Tools,” Proceedings of the Cray Users' Group Meeting, Atlanta, GA, May 2010.

Tomov, S., W. Lu, J. Bernholc, S. Moore, and J. Dongarra, “Performance evaluation for petascale quantum simulation tools,” Proceedings of CUG09, Atlanta, GA, May 2009.

(1.09 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” The International Supercomputing Conference (ISC High Performance 2016), Frankfurt, Germany, June 2016.

(1.27 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016.

(1.98 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, no. 9697: Springer International Publishing, pp. 21–38, 2016.

(1.98 MB)

Abdelfattah, A., A. Haidar, S. Tomov, and J. Dongarra, “Performance, Design, and Autotuning of Batched GEMM for GPUs,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-739: University of Tennessee, February 2016.

(1.27 MB)

Dongarra, J., H. Jagode, A. Danalis, D. Barry, and V. Weaver, Performance Application Programming Interface for Extreme-Scale Environments (PAPI-EX) (Poster) , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, 20 2020.

(2.53 MB)

Dongarra, J., T. Herault, and Y. Robert, “Performance and Reliability Trade-offs for the Double Checkpointing Algorithm,” International Journal of Networking and Computing, vol. 4, no. 1, pp. 32-41.

(859.04 KB)

Haidar, A., C. Cao, I. Yamazaki, J. Dongarra, M. Gates, P. Luszczek, and S. Tomov, “Performance and Portability with OpenCL for Throughput-Oriented HPC Workloads Across Accelerators, Coprocessors, and Multicore Processors,” 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA '14), New Orleans, LA, IEEE, November 2014.

(407.5 KB)

Anzt, H., S. Tomov, and J. Dongarra, “On the performance and energy efficiency of sparse linear algebra on GPUs,” International Journal of High Performance Computing Applications, October 2016.

(1.19 MB)

Cao, Q., Y. Pei, T. Herault, K. Akbudak, A. Mikhalev, G. Bosilca, H. Ltaief, D. Keyes, and J. Dongarra, “Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools,” Workshop on Programming and Performance Visualization Tools (ProTools 19) at SC19, Denver, CO, ACM, November 2019.

(429.55 KB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” Cluster Computing Journal (to appear), January 2005.

(1018.28 KB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” Cluster computing, vol. 10, no. 2: Springer Netherlands, pp. 127-143, June 2007.

(1018.28 KB)

Pjesivac–Grbovic, J., T. Angskun, G. Bosilca, G. Fagg, E. Gabriel, and J. Dongarra, “Performance Analysis of MPI Collective Operations,” 4th International Workshop on Performance Modeling, Evaluation, and Optmization of Parallel and Distributed Systems (PMEO-PDS '05), Denver, Colorado, April 2005.

(1018.28 KB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Optimization of Two-Sided Factorization Algorithms for Heterogeneous Platform,” International Conference on Computational Science (ICCS 2015), Reykjavík, Iceland, June 2015.

(1.12 MB)

Kabir, K., A. Haidar, S. Tomov, and J. Dongarra, “Performance Analysis and Design of a Hessenberg Reduction using Stabilized Blocked Elementary Transformations for New Architectures,” The Spring Simulation Multi-Conference 2015 (SpringSim'15), Best Paper Award, Alexandria, VA, April 2015.

(608.44 KB)

Haidar, A., B. Brock, S. Tomov, M. Guidry, J. Jay Billings, D. Shyles, and J. Dongarra, “Performance Analysis and Acceleration of Explicit Integration for Large Kinetic Networks using Batched GPU Computations,” 2016 IEEE High Performance Extreme Computing Conference (HPEC ‘16), Waltham, MA, IEEE, September 2016.

(480.29 KB)

Bhatia, N., S. Moore, F. Wolf, J. Dongarra, and B. Mohr, “A Pattern-Based Approach to Automated Application Performance Analysis,” Workshop on Patterns in High Performance Computing, University of Illinois at Urbana-Champaign, May 2005.

(3.47 MB)

Danalis, A., H. Jagode, G. Bosilca, and J. Dongarra, “PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution,” 2015 IEEE International Conference on Cluster Computing, Chicago, IL, IEEE, September 2015.

(1.77 MB)

Bosilca, G., A. Bouteiller, A. Danalis, M. Faverge, T. Herault, and J. Dongarra, “PaRSEC: Exploiting Heterogeneity to Enhance Scalability,” IEEE Computing in Science and Engineering, vol. 15, issue 6, pp. 36-45, November 2013.

(2.16 MB)

Anzt, H., T. Ribizel, G. Flegar, E. Chow, and J. Dongarra, “ParILUT – A Parallel Threshold ILU for GPUs,” IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(505.95 KB)

Anzt, H., E. Chow, and J. Dongarra, “ParILUT - A New Parallel Threshold ILU,” SIAM Journal on Scientific Computing, vol. 40, issue 4: SIAM, pp. C503–C519, July 2018.

(19.26 MB)

Youseff, L., K. Seymour, H. You, D. Zagorodnov, J. Dongarra, and R. Wolski, “Paravirtualization Effect on Single- and Multi-threaded Memory-Intensive Linear Algebra Software,” Cluster Computing Journal: Special Issue on High Performance Distributed Computing, vol. 12, no. 2: Springer Netherlands, pp. 101-122, 00 2009.

(451.07 KB)

Tisseur, F., and J. Dongarra, “Parallelizing the Divide and Conquer Algorithm for the Symmetric Tridiagonal Eigenvalue Problem on Distributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 6, no. 20, pp. 2223-2236, October 2002.

(321.36 KB)

Baboulin, M., D. Becker, and J. Dongarra, “A Parallel Tiled Solver for Symmetric Indefinite Systems On Multicore Architectures,” IPDPS 2012, Shanghai, China, May 2012.

(544.09 KB)

Baboulin, M., D. Becker, and J. Dongarra, “A parallel tiled solver for dense symmetric indefinite systems on multicore architectures,” University of Tennessee Computer Science Technical Report, no. ICL-UT-11-07, October 2011.

(544.2 KB)

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “Parallel Tiled QR Factorization for Multicore Architectures,” University of Tennessee Computer Science Dept. Technical Report, UT-CS-07-598 (also LAPACK Working Note 190), 00 2007.

(277.92 KB)

Buttari, A., J. Langou, J. Kurzak, and J. Dongarra, “Parallel Tiled QR Factorization for Multicore Architectures,” Concurrency and Computation: Practice and Experience, vol. 20, pp. 1573-1590, January 2008.

(277.92 KB)

Jia, Y., G. Bosilca, P. Luszczek, and J. Dongarra, “Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance,” International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE-SC 2013, Denver, CO, November 2013.

(147.09 KB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Seattle, WA, November 2011.

(636.01 KB)

Haidar, A., H. Ltaeif, and J. Dongarra, “Parallel Reduction to Condensed Forms for Symmetric Eigenvalue Problems using Aggregated Fine-Grained and Memory-Aware Kernels,” University of Tennessee Computer Science Technical Report, UT-CS-11-677, (also Lawn254), August 2011.

(636.01 KB)

Abalenkovs, M., A. Abdelfattah, J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, I. Yamazaki, and A. YarKhan, “Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems,” Supercomputing Frontiers and Innovations, vol. 2, no. 4, October 2015.

(3.68 MB)

“Parallel Processing and Applied Mathematics, 9th International Conference, PPAM 2011,” Lecture Notes in Computer Science, vol. 7203, Torun, Poland, 00 2012.

Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, “Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part II,” Lecture Notes in Computer Science, no. 12044: Springer International Publishing, pp. 503, March 2020.

Wyrzykowski, R., E. Deelman, J. Dongarra, and K. Karczewski, “Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I,” Lecture Notes in Computer Science, 1, no. 12043: Springer International Publishing, pp. 581, March 2020.

Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Luszczek, J. Finney, and J. Dongarra, “Parallel Norms Performance Report,” SLATE Working Notes, no. 06, ICL-UT-18-06: Innovative Computing Laboratory, University of Tennessee, June 2018.

(1.13 MB)

Henry, G., D. Watkins, and J. Dongarra, “A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 24, no. 1, pp. 284-311, January 2003.

(224.7 KB)

Henry, G., D. Watkins, and J. Dongarra, “A Parallel Implementation of the Nonsymmetric QR Algorithm for Disitributed Memory Architectures,” SIAM Journal on Scientific Computing, vol. 16, no. 2, pp. 284-311, October 2002.

(224.7 KB)

Buttari, A., J. Dongarra, J. Kurzak, and J. Langou, “Parallel Dense Linear Algebra Software in the Multicore Era,” in Cyberinfrastructure Technologies and Applications: Nova Science Publishers, Inc., pp. 9-24, 00 2009.

Ltaeif, H., J. Kurzak, and J. Dongarra, “Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited,” University of Tennessee Computer Science Technical Report, UT-CS-08-624 (also LAPACK Working Note 208), August 2008.

(420.31 KB)

Kurzak, J., M. Gates, A. YarKhan, I. Yamazaki, P. Wu, P. Luszczek, J. Finney, and J. Dongarra, “Parallel BLAS Performance Report,” SLATE Working Notes, no. 05, ICL-UT-18-01: University of Tennessee, April 2018.

(4.39 MB)

Main menu

Pages