Publications

Luszczek, P., D. Bailey, J. Dongarra, J. Kepner, R. Lucas, R. Rabenseifner, and D. Takahashi, “The HPC Challenge (HPCC) Benchmark Suite,” SC06 Conference Tutorial, Tampa, Florida, IEEE, November 2006.

(1.08 MB)

Luszczek, P., M. Gates, J. Kurzak, A. Danalis, and J. Dongarra, “Search Space Generation and Pruning System for Autotuners,” 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Chicago, IL, IEEE, May 2016.

(555.44 KB)

Luszczek, P., H. Ltaeif, and J. Dongarra, “Two-stage Tridiagonal Reduction for Dense Symmetric Matrices using Tile Algorithms on Multicore Architectures,” IEEE International Parallel and Distributed Processing Symposium (submitted), Anchorage, AK, May 2011.

Luszczek, P., W. M. Sid-Lakhdar, and J. Dongarra, “Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering,” The International Journal of High Performance Computing Applications, March 2023.

Luszczek, P., and J. Dongarra, “Design of an Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations,” International Conference on Computational Science, Poland, Springer Verlag, June 2004.

(88.31 KB)

Luszczek, P., “High Performance Development for High End Computing with Python Language Wrapper (PLW),” International Journal of High Performance Computing Applications (to appear), 00 2006.

(179.32 KB)

Luszczek, P., I. Yamazaki, and J. Dongarra, “Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators,” IEEE High Performance Extreme Computing Conference (HPEC 2019), Best Paper Finalist, Waltham, MA, IEEE, September 2019.

(470.21 KB)

Luszczek, P., J. Kurzak, and J. Dongarra, “Changes in Dense Linear Algebra Kernels - Decades Long Perspective,” in Solving the Schrodinger Equation: Has everything been tried? (to appear): Imperial College Press, 00 2011.

Luszczek, P., and J. Dongarra, “Anatomy of a Globally Recursive Embedded LINPACK Benchmark,” 2012 IEEE High Performance Extreme Computing Conference, Waltham, MA, pp. 1-6, September 2012.

(204.74 KB)

Luszczek, P., and J. Dongarra, The PLASMA Library on CORAL Systems and Beyond (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(550.86 KB)

Luszczek, P., J. Kurzak, and J. Dongarra, “Looking Back at Dense Linear Algebra Software,” Journal of Parallel and Distributed Computing, vol. 74, issue 7, pp. 2548–2560, July 2014.

(1.79 MB)

Luszczek, P., and J. Dongarra, “Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,” Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.

(226.9 KB)

Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, and J. Dongarra, “Scaling Point Set Registration in 3D Across Thread Counts on Multicore and Hardware Accelerator Platforms through Autotuning for Large Scale Analysis of Scientific Point Clouds,” IEEE International Workshop on Benchmarking, Performance Tuning and Optimization for Big Data Applications (BPOD 2017), Boston, MA, IEEE, December 2017.

(6.71 MB)

Luszczek, P., and C. Brown, “Surrogate ML/AI Model Benchmarking for FAIR Principles' Conformance,” 2022 IEEE High Performance Extreme Computing Conference (HPEC): IEEE, September 2022.

Luszczek, P., and D. Koester, “HPC Challenge v1.x Benchmark Suite,” SC|05 Tutorial - S13, Seattle, Washington, January 2005.

(2.94 MB)

Luszczek, P., J. Kurzak, I. Yamazaki, D. Keffer, V. Maroulas, and J. Dongarra, “Autotuning Techniques for Performance-Portable Point Set Registration in 3D,” Supercomputing Frontiers and Innovations, vol. 5, no. 4, December 2018.

(720.15 KB)

Luszczek, P., J. Kurzak, I. Yamazaki, and J. Dongarra, “Towards Numerical Benchmark for Half-Precision Floating Point Arithmetic,” 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, IEEE, September 2017.

(1.67 MB)

Luszczek, P., J. Kurzak, and J. Dongarra, “Looking Back at Dense Linear Algebra Software,” Perspectives on Parallel and Distributed Processing: Looking Back and What's Ahead (to appear), 00 2012.

(235.91 KB)

Luszczek, P., “Parallel Programming in MATLAB,” The International Journal of High Performance Computing Applications, vol. 23, no. 3, pp. 277-283, July 2009.

(215.71 KB)

Luszczek, P., E. Meek, S. Moore, D. Terpstra, V. M. Weaver, and J. Dongarra, “Evaluation of the HPC Challenge Benchmarks in Virtualized Environments,” 6th Workshop on Virtualization in High-Performance Cloud Computing, Bordeaux, France, August 2011.

(114.73 KB)

Luszczek, P., Y. Tsai, N. Lindquist, H. Anzt, and J. Dongarra, “Scalable Data Generation for Evaluating Mixed-Precision Solvers,” 2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, IEEE, September 2020.

(1.3 MB)

Luszczek, P., J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, and D. Takahashi, Introduction to the HPC Challenge Benchmark Suite , March 2005.

(124.86 KB)

Luo, X., W. Wu, G. Bosilca, Y. Pei, Q. Cao, T. Patinyasakdikul, D. Zhong, and J. Dongarra, “HAN: A Hierarchical AutotuNed Collective Communication Framework,” IEEE Cluster Conference, Kobe, Japan, Best Paper Award, IEEE Computer Society Press, September 2020.

(764.05 KB)

Luo, X., W. Wu, G. Bosilca, T. Patinyasakdikul, L. Wang, and J. Dongarra, “ADAPT: An Event-Based Adaptive Collective Communication Framework,” The 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18), Tempe, Arizona, ACM Press, June 2018.

(493.65 KB)

Lukarski, D., H. Anzt, S. Tomov, and J. Dongarra, “Hybrid Multi-Elimination ILU Preconditioners on GPUs,” International Heterogeneity in Computing Workshop (HCW), IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.67 MB)

Lu, Y., I. Yamazaki, F. Ino, Y. Matsushita, S. Tomov, and J. Dongarra, “Reducing the Amount of out-of-core Data Access for GPU-Accelerated Randomized SVD,” Concurrency and Computation: Practice and Experience, April 2020.

(1.43 MB)

Ltaeif, H., P. Luszczek, and J. Dongarra, “High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-673, (also Lawn 247), May 2011.

(424.93 KB)

Ltaeif, H., J. Kurzak, and J. Dongarra, “Parallel Block Hessenberg Reduction using Algorithms-By-Tiles for Multicore Architectures Revisited,” University of Tennessee Computer Science Technical Report, UT-CS-08-624 (also LAPACK Working Note 208), August 2008.

(420.31 KB)

Ltaeif, H., S. Tomov, R. Nath, P. Du, and J. Dongarra, “A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators,” Proc. of VECPAR'10 (to appear), Berkeley, CA, June 2010.

(870.46 KB)

Ltaeif, H., J. Kurzak, and J. Dongarra, “Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems (to appear), May 2009.

(208.16 KB)

Ltaeif, H., P. Luszczek, and J. Dongarra, “Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energy Efficiency,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.

(1.27 MB)

Ltaeif, H., J. Kurzak, and J. Dongarra, “Parallel Band Two-Sided Matrix Bidiagonalization for Multicore Architectures,” IEEE Transactions on Parallel and Distributed Systems, pp. 417-423, April 2010.

(208.16 KB)

Ltaeif, H., P. Luszczek, and J. Dongarra, “Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures using Tree Reduction,” Lecture Notes in Computer Science, vol. 7203, pp. 661-670, September 2012.

(185.77 KB)

Ltaeif, H., P. Luszczek, and J. Dongarra, “High Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures,” ACM Transactions on Mathematical Software (TOMS), vol. 39, issue 3, no. 16, 2013.

(665.7 KB)

Ltaeif, H., J. Kurzak, J. Dongarra, and R. M. Badia, “Scheduling Two-sided Transformations using Tile Algorithms on Multicore Architectures,” Journal of Scientific Computing, vol. 18, no. 1, pp. 33-50, 00 2010.

(334.5 KB)

Ltaeif, H., S. Tomov, R. Nath, and J. Dongarra, “Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators,” IEEE Transaction on Parallel and Distributed Systems (submitted), March 2010.

(3.75 MB)

Losada, N., A. Bouteiller, and G. Bosilca, “Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications,” Fault Tolerance for HPC at eXtreme Scale (FTXS) Workshop at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), November 2019.

(440.7 KB)

Losada, N., G. Bosilca, A. Bouteiller, P. González, and M. J. Martín, “Local Rollback for Resilient MPI Applications with Application-Level Checkpointing and Message Logging,” Future Generation Computer Systems, vol. 91, pp. 450-464, February 2019.

(1.16 MB)

Losada, N., P. González, M. J. Martín, G. Bosilca, A. Bouteiller, and K. Teranishi, “Fault Tolerance of MPI Applications in Exascale Systems: The ULFM Solution,” Future Generation Computer Systems, vol. 106, pp. 467-481, May 2020.

(2.06 MB)

Lopez, F., and T. Mary, “Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-13: University of Tennessee, September 2020.

(409 KB)

Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-04: University of Tennessee, Knoxville, March 2020.

(188.51 KB)

Lopez, M. G., V. Larrea, W. Joubert, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, “Towards Achieving Performance Portability Using Directives for Accelerators,” The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), Third Workshop on Accelerator Programming Using Directives (WACCPD), Salt Lake City, Utah, Innovative Computing Laboratory, University of Tennessee, November 2016.

(567.02 KB)

Lopez, M. G., W. Joubert, V. Larrea, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra, “Evaluation of Directive-Based Performance Portable Programming Models,” International Journal of High Performance Computing and Networking, vol. 14, issue 2, pp. 165-182.

(1.12 MB)

Lopez, F., E. Chow, S. Tomov, and J. Dongarra, “Asynchronous SGD for DNN Training on Shared-Memory Parallel Architectures,” Workshop on Scalable Deep Learning over Parallel And Distributed Infrastructures (ScaDL 2020), May 2020.

(188.51 KB)

London, K., S. Moore, P. Mucci, K. Seymour, and R. Luczak, “The PAPI Cross-Platform Interface to Hardware Performance Counters,” Department of Defense Users' Group Conference Proceedings, Biloxi, Mississippi, June 2001.

(328.56 KB)

London, K., J. Dongarra, S. Moore, P. Mucci, K. Seymour, and T.. Spencer, “End-user Tools for Application Performance Analysis, Using Hardware Counters,” International Conference on Parallel and Distributed Computing Systems, Dallas, TX, August 2001.

(306.54 KB)

Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, and K. Cameron, “Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems,” International Journal of High Performance Computing Applications, vol. 25, no. 3, pp. 342-350, 00 2011.

(467.18 KB)

Lively, C., X. Wu, V. Taylor, S. Moore, H-C. Chang, C-Y. Su, and K. Cameron, “Power-Aware Prediction Models of Hybrid (MPI/OpenMP) Scientific Applications,” International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, September 2011.

(479.49 KB)

Lindquist, N., P. Luszczek, and J. Dongarra, “Accelerating Restarted GMRES with Mixed Precision Arithmetic,” IEEE Transactions on Parallel and Distributed Systems, June 2021.

(572.4 KB)

Lindquist, N., P. Luszczek, and J. Dongarra, “Improving the Performance of the GMRES Method using Mixed-Precision Techniques,” Smoky Mountains Computational Sciences & Engineering Conference (SMC2020), August 2020.

(600.33 KB)

Main menu

Publications

Pages